Wednesday, April 17, 2013

Design of Mesa 3D Part 4: Lexing Shaders

I'm going to skip over some of the pieces that I've mentioned already (namely creating a shader and attaching source to the shader) since those work in much the same way as I've already described. Now I'd like to jump in to the fun part: GLSL lexing and parsing! The relevant entry point is _slang_compile(), found in src/mesa/shader/slang/slang_compile.c. This function delegates to compile_shader(), which delegates to compile_object(), which is where the fun really starts. The first thing this function calls is grammar_load_from_text((const byte *) (slang_shader_syn)). slang_shader_syn is defined like this:

LONGSTRING static const char *slang_shader_syn =
#include "library/slang_shader_syn.h"
;

Alright, well let's take a look at that file. Opening it up, we see stuff like this:

/* DO NOT EDIT - THIS FILE IS AUTOMATICALLY GENERATED FROM THE .syn FILE */
".syntax translation_unit;\n"
".emtcode REVISION 5\n"
".emtcode EXTERNAL_NULL 0\n"
".emtcode EXTERNAL_FUNCTION_DEFINITION 1\n"
".emtcode EXTERNAL_DECLARATION 2\n"
".emtcode DEFAULT_PRECISION 3\n"
...

Clearly this is a long string, but I don't recognize the language that it's writing in. Let's take a look at the .syn file (src/mesa/shader/slang/library/slang_shader.syn):

.syntax translation_unit;
/* revision number - increment after each change affecting emitted output */
.emtcode REVISION                                   5
/* external declaration (or precision or invariant stmt) */
.emtcode EXTERNAL_NULL                              0
.emtcode EXTERNAL_FUNCTION_DEFINITION               1
.emtcode EXTERNAL_DECLARATION                       2
.emtcode DEFAULT_PRECISION                          3
...

This actually looks almost exactly the same as the ".h" file, with some comments at the top. The comments describe that the translation from the .syn to the .h file is done with src/mesa/shader/slang/library/syn_to_c.c, so let's take a look at the converter. The entire source isn't that long, so it's easy to see that the script simply removes comments and adds escape characters, allowing the source to be #included as a string. Straightforward enough.

However, we still don't understand the meaning of the source of the .syn file. Let's see if we can glean any information from how it's used, so look at grammar_load_from_text(), defined in src/mesa/shader/grammar/grammar.c. That file actually has a very helpful (and long) comment at the top of it explaining exactly what the language is and what kind of thing it describes. I won't copy and paste the entire thing here, but I will give an executive summary:

The file is a collection of declarations, which are essentially rules in a formal language. Each definition, however, is very simplified: a literal character in the body of a declaration means that the next character in the input stream should be that particular character. If the next character doesn't match, the rule fails. The definition is defined as a list of specifiers joined by either the ".and" or the ".or" keyword. The language also allows for, if a particular specifier matches successfully, to emit a character. Therefore, this language defines a transformation from an input string to an output string. The comment also gives a little justification for why this language exists: it claims that describing GLSL in C code itself is error-prone, so instead, the description of GLSL should be in some other language. The contents of grammar_load_from_text() essentially parse a description of a language (which is itself described in the language that I've just talked about, called "Synek"), and constructs a function from a stream of characters to a stream of characters. Alright.

So now, let's get back to compile_object(). The next thing this function does is run a few invocations of compile_binary(), each of which is run on something like "slang_core_gc" or "slang_120_core_gc". These symbols are defined similarly to slang_shader_syn:

static const byte slang_core_gc[] = {
#include "library/slang_core_gc.h"
};

Let's take a look at this file:

5,1,90,95,0,0,5,0,1,1,1,0,0,9,0,102,0,0,0,1,4,118,101,99,52,95,116,111,95,105,118,101,99,52,0,18,
95,95,114,101,116,86,97,108,0,0,18,102,0,0,0,0,1,90,95,0,0,5,0,1,1,1,0,0,1,0,98,0,0,0,1,9,18,95,95,
114,101,116,86,97,108,0,18,98,0,20,0,0,1,90,95,0,0,5,0,1,1,1,0,0,5,0,105,0,0,0,1,9,18,95,95,114,
101,116,86,97,108,0,18,105,0,20,0,0,1,90,95,0,0,1,0,1,1,1,0,0,5,0,105,0,0,0,1,4,118,101,99,52,95,
115,110,101,0,18,95,95,114,101,116,86,97,108,0,0,18,105,0,0,17,48,0,48,0,0,0,0,0,1,90,95,0,0,1,0,1,
...

Well that's unhelpful. A comment at the top of the file says that the file was generated from slang_core.gc, let's take a look at that:

int __constructor(const float f)
{
   __asm vec4_to_ivec4 __retVal, f;
}
int __constructor(const bool b)
{
   __retVal = b;
}
int __constructor(const int i)
{
   __retVal = i;
}
...

Interesting. This looks like GLSL! Except, it looks like it's definitions of symbols that are built into the language of GLSL. It's straightforward enough to look at the script that generated slang_core_gc.h from slang_core.gc (src/mesa/shader/slang/library/gc_to_bin.c). I won't copy the source here, but the file is quite simple. That file essentially just opens up the input file, calls grammar_fast_check() on the source of the input file, then outputs the string that that function produces. grammar_fast_check(), defined in src/mesa/shader/grammar/grammar.c, is just the function that "runs" the grammar, outputting the sequence of characters that the Synek describes. In total, this means that the builtin functions are pre-lexed, so libGL doesn't have to do this at runtime. Smart! We can also see, back in , that each invocation of compile_binary puts its output into object->builtin[x]. Cool!

The last thing that compile_object() does is run compile_with_grammar(), which runs grammar_fast_check(), the same function that gc_to_bin.c ran. So, we're running all our GLSL code through the same lexer, the only difference is that the builtin functions get run through the lexer at compile time, and the user-specified functions get run through the lexer at runtime. Cool! One difference, however, is that the user-specified shader has to have a preprocess pass, because it might have preprocessor macros. The builtin code, however, doesn't have any preprocessor macros, so it's unnecessary.

Alright, that doesn't actually solve our problem, however. Synek simply tokenizes the input; it doesn't parse the input. At this point, we don't have a sequence of instructions to run; we only have a sequence of token that represent the input. The missing piece is at the end of compile_with_grammar(), namely, a call to compile_binary(). Note that this call contains all the lexed source, including the builtins. We still, however, have to run a translation to create instructions that our Mesa virtual machine can execute. I'll save that for next time!

No comments:

Post a Comment