Saturday, April 20, 2013

Design of Mesa 3D Part 5: Parsing Shaders

Last time I described how GLSL source gets lexed and turned into a stream of tokens, isomorphic with the source of the input shader. However, a stream of tokens representing the input isn't really anywhere near what we want, which is a sequence of assembly commands that we can execute. The next step, once you have a stream of tokens, is to create a data structure representing the program, which is isomorphic with the token stream (called an abstract syntax tree). The actual conversion here isn't really the most interesting piece (though I'll mention a couple interesting parts of the conversion function); instead, I'd like to describe the data structure that represents a program. Once we understand the relevant data structure, filling it in from a sequence of tokens is straightforward.

Most parsing functions take just two arguments: an input slang_parse_ctx and a slang_output_ctx. The slang_parse_ctx is straightforward, it essentially wraps the output of the lexer with a few metadata options. Here's its definition, from src/mesa/shader/slang/slang_compile.c.

typedef struct slang_parse_ctx_
   const byte *I;
   slang_info_log *L;
   int parsing_builtin;
   GLboolean global_scope;   /**< Is object being declared a global? */
   slang_atom_pool *atoms;
   slang_unit_type type;     /**< Vertex vs. Fragment */
   GLuint version;           /**< user-specified (or default) #version */
} slang_parse_ctx;

I is the input stream of symbols that the lexer produced, and everything else is fairly straightforward. The output of the parser is a slang_output_ctx; here's the definition of that struct:

typedef struct slang_output_ctx_
   slang_variable_scope *vars;
   slang_function_scope *funs;
   slang_struct_scope *structs;
   struct gl_program *program;
   struct gl_sl_pragmas *pragmas;
   slang_var_table *vartable;
   GLuint default_precision[TYPE_SPECIFIER_COUNT];
   GLboolean allow_precision;
   GLboolean allow_invariant;
   GLboolean allow_centroid;
   GLboolean allow_array_types;  /* float[] syntax */
} slang_output_ctx;

As you can see, this is the representation of the program. The lines that parse the GLboolean fields are straightforward as you would imagine; they just check a single token in the lexer's output. The scope variables are defined as a linked list from one scope to an outer scope; here's an example from src/mesa/shader/slang/slang_compile_variable.h:

typedef struct slang_variable_scope_
   slang_variable **variables;  /**< Array [num_variables] of ptrs to vars */
   GLuint num_variables;
   struct slang_variable_scope_ *outer_scope;
} slang_variable_scope;

This means that if you want to locate a variable, you just have to walk the linked list of scopes to find it, similar to walking a prototype chain.

Each individual variable is defined like this in the same file:

typedef struct slang_variable_
   slang_fully_specified_type type; /**< Variable's data type */
   slang_atom a_name;               /**< The variable's name (char *) */
   GLuint array_len;                /**< only if type == SLANG_SPEC_ARRAy */
   struct slang_operation_ *initializer; /**< Optional initializer code */
   GLuint size;                     /**< Variable's size in bytes */
   GLboolean is_global;
   GLboolean isTemp;                /**< a named temporary (__resultTmp) */
   GLboolean declared;              /**< for debug */
   struct slang_ir_storage_ *store; /**< Storage for this var */
} slang_variable;

A slang_atom is simply typedef'ed to a void*. Let's look at how types are represented, in src/mesa/shader/slang/slang_typeinfo.h:

typedef struct slang_fully_specified_type_
   slang_type_qualifier qualifier;
   slang_type_specifier specifier;
   slang_type_precision precision;
   slang_type_variant variant;
   slang_type_centroid centroid;
   GLint array_len;           /**< -1 if not an array type */
} slang_fully_specified_type;

The qualifier is an enum with items "const", "attribute", "varying", "uniform", "out", and "inout". The specifier is the following struct:

typedef struct slang_type_specifier_
   slang_type_specifier_type type;
   struct slang_struct_ *_struct;         /**< if type == SLANG_SPEC_STRUCT */
   struct slang_type_specifier_ *_array;  /**< if type == SLANG_SPEC_ARRAY */
} slang_type_specifier;

This is a recursive data structure used to define user-created types in terms of builtin types. The builtin types (slang_type_specifier_type) is an enum with an item for each builtin type: "bool", "int", "vec2", "mat32", etc. Looking back at slang_fully_specified_type, the precision is an enum with "high", "medium", and "low" entries. The variant member is an enum with just "variant" and "invariant". The centroid member is an enum with just "center" and "centroid".

Let's look at the last field in the slang_variable struct, slang_ir_storage_, in src/mesa/shader/slang/slang_ir.h:

struct slang_ir_storage_
   gl_register_file File;  /**< PROGRAM_TEMPORARY, PROGRAM_INPUT, etc */
   GLint Index;    /**< -1 means unallocated */
   GLint Size;     /**< number of floats or ints */
   GLuint Swizzle; /**< Swizzle AND writemask info */
   GLint RefCount; /**< Used during IR tree delete */
   GLboolean RelAddr; /* we'll remove this eventually */
   GLboolean IsIndirect;
   gl_register_file IndirectFile;
   GLint IndirectIndex;
   GLuint IndirectSwizzle;
   GLuint TexTarget;  /**< If File==PROGRAM_SAMPLER, one of TEXTURE_x_INDEX */
   /** If Parent is non-null, Index is relative to parent.
    * The other fields are ignored.
   struct slang_ir_storage_ *Parent;

This represents an offset into a gl_register_file, with some allowances for indirect offsets. The register file, defined in src/mesa/main/mtypes.h, is an enum that specifies which of the various register file to use inside the virtual machine that will eventually perform shader execution.

Cool. Now we understand how variables are represented. What about the other members in the slang_output_context, for example, slang_function_scope? Well, the "scope" piece is set up the same way as for variables (though it's defined in src/mesa/shader/slang/slang_compile_function.h), so I'll skip right to the slang_function struct, defined in the same file. Here it is:

typedef struct slang_function_
   slang_function_kind kind;
   slang_variable header;      /**< The function's name and return type */
   slang_variable_scope *parameters; /**< formal parameters AND local vars */
   unsigned int param_count;   /**< number of formal params (no locals) */
   slang_operation *body;      /**< The instruction tree */
} slang_function;

We've already looked at slang_variable and slang_variable_scope; I think it's neat that they're re-using these structs. The slang_function_kind type is an enum with items for "ordinary", "constructor", and "operator". I'm assuming that the constructors are used for the builtin code that initializes vec3s and mat4s and such. Let's now look at slang_operation, defined in mesa/shader/slang/slang_compile_operation.h:

typedef struct slang_operation_
   slang_operation_type type;
   struct slang_operation_ *children;
   GLuint num_children;
   GLfloat literal[4];           /**< Used for float, int and bool values */
   GLuint literal_size;          /**< 1, 2, 3, or 4 */
   slang_atom a_id;              /**< type: asm, identifier, call, field */
   slang_atom a_obj;             /**< object in a method call */
   slang_variable_scope *locals; /**< local vars for scope */
   struct slang_function_ *fun;  /**< If type == SLANG_OPER_CALL */
   struct slang_variable_ *var;  /**< If type == slang_oper_identier */
   struct slang_label_ *label;   /**< If type == SLANG_OPER_LABEL */
   /** If type==SLANG_OPER_CALL and we're calling an array constructor,
    * for which there's no real function, we need to have a flag to
    * indicate such.  num_children indicates number of elements.
   GLboolean array_constructor;
} slang_operation;

slang_operation_type is a giant enum with, for example, entries meaning "do", "for", "assign", "add", "less", "equal", "minus", "call", etc. You can see that it's a recursive data type, creating a tree of computation. The comments give a pretty good explanation of the usage of each member.

slang_struct_scope is set up in much the same way. Here's the relevant definition, from src/mesa/shader/slang/slang_compile_struct.h:

typedef struct slang_struct_
   slang_atom a_name;
   struct slang_variable_scope_ *fields;
   slang_struct_scope *structs;
   struct slang_function_ *constructor;
} slang_struct;

Most of the parsing functions, in src/mesa/shader/slang/slang_compile*.c, are straightforward: they create relevant data structures and fill in their members based on the output of the lexer. I did want to mention one place, however, that I thought was interesting: parse_statement() in src/mesa/shader/slang/slang_compile.c. This is the bulk of what people usually mean when they say "parse". This function has a switch statement with elements for "new scope", "no new scope", "declare", "if", "while", "for", etc. Each of these cases delegates to parse_child_operation(), passing a "true" or a "false" to its "statement" argument. If "statement" is true, parse_child_operation() recursively calls parse_statement(), but if it's false, it calls parse_expression().  The "for" case, for example, calls parse_child_operation() 4 times, and 3 of those 4 specify "true" and the other one specifies "false".

parse_expression() consumes characters until it receives an OP_END, and, each iteration, reallocs an array of the operations that it constructs. Then, there's a giant switch statement, switching over each type of operation that a shader can perform. If the operation is one that can have sub-operations (for example, "add" takes two sub-operations), the function fills in the type of the operation and then delegates to handle_nary_expression(..., n). This function simply sets up the n child pointers inside the expression objects. I also noticed something: the code that actually sets up these child pointers looks like this:

   for (i = 0; i < n; i++) {
      op->children[i] = (*ops)[*total_ops - (n + 1 - i)];

This means that the operation occurs in postorder, because 'total_ops' is the index of the newly-added op, and n is the number of child ops. This is also the case with the statement operators: the first two parse_child_operations()s specify that they should be parsing statements, then comes an expression, then another statement (which is backwards). This means that the lexer outputs these items in reverse order.

Alright, cool: now we've got a big data structure defining the structure of the GLSL shader. Now we've got to actually generate instructions which will execute the shader!

No comments:

Post a Comment