Sunday, April 21, 2013

Design of Mesa 3D Part 6: Intermediate Representation Translation of Shaders

Previously, I had covered lexing and parsing of shaders. At this point, we have a large data structure describing the structure of a shader. Now, we want to convert this data structure (specifically, the main function) into a stream of commands that our VM can execute during shader execution. We do that by translating the slang_operation_ tree that we created earlier into a similar but simpler tree, which is called an intermediate representation. Then, we can actually emit instructions from this intermediate representation.

Because a slang_operation_ is already a tree of operations, this translation is fairly straightforward. It happens in _slang_codegen_function(), defined in src/mesa/shader/slang/slang_codegen.c.  Interestingly enough, this function makes sure that it's only called on the "main" function, because most other functions should get inlined. The functions that can't get inlined will get codegen'ed upon an actual call to the function. I'll talk more about inlining later.

The first thing that function does is it calls _slang_simplify(), defined in src/mesa/shader/slang/slang_simplify.c, to do some trivial simplifications. The simplifications are:
  • Replacing references to constant variables with the literal form of those variables
  • Performing addition, subtraction, multiplication, division, negation, logical and, logical or, and logical xor on literal values, and replacing the operation with the result of the computation. This is done bottom-up, so the large constant expressions can be simplified properly
  • Replacing calls to vector constructors with literal arguments to a literal vector
Alright, back to _slang_codegen_function(), which calls _slang_gen_operation().  This function has a giant switch statement, where it switches over all of the operation types. For each one, it calls a relevant _slang_gen_*() function. Interestingly enough, addition, multiplication, etc., as well as operators like the post increment operator, etc., are translated directly into a function call to a function named "+", which is defined in the builtin .gc files. This means that most of the work that we're about to do is just nested function calls. Ultimately, there are some functions (such ass adding two floats) that can't delegate to other functions; these functions are defined with the "__asm" keyword in the .gc files. We also care about assigning things to variables, and sequencing these assignments. "if" statements and loops are also interesting. Because shaders can't really do any IO, there's not much else that shaders can do.

Alright, let's take these one at a time, starting with assembly instructions. The relevant node is SLANG_OPER_ASM, which just calls _slang_gen_asm(). Let's look at an example, taken from src/mesa/shader/slang/library/slang_core.gc:

int __operator + (const int a, const int b)
{
   __asm vec4_add __retVal, a, b;
}

You may notice a couple things about this function. First of all, even for adding floats, the command adds vec4s. This is because Mesa assumes that all registers are vec4s, which is true on many graphics cards. Secondly, the function outputs into a variable called "__retVal", which Mesa uses as a hidden return value. Each assembly statement gets marked with its own "__asm" keyword, and each assembly command takes at most 3 arguments. This can also be verified by looking at the slang_ir_node struct in src/mesa/shader/slang/slang_ir.h:

typedef struct slang_ir_node_
{
   slang_ir_opcode Opcode;
   struct slang_ir_node_ *Children[3];
   slang_ir_storage *Store;  /**< location of result of this operation */
   GLint InstLocation;  /**< Location of instruction emitted for this node */

   /** special fields depending on Opcode: */
   const char *Field;  /**< If Opcode == IR_FIELD */
   GLfloat Value[4];    /**< If Opcode == IR_FLOAT */
   slang_variable *Var;  /**< If Opcode == IR_VAR or IR_VAR_DECL */
   struct slang_ir_node_ *List;  /**< For various linked lists */
   struct slang_ir_node_ *Parent;  /**< Pointer to logical parent (ie. loop) */
   slang_label *Label;  /**< Used for branches */
   const char *Comment; /**< If Opcode == IR_COMMENT */
} slang_ir_node;

slang_ir_opcode is an enum with all the different kinds of nodes. For example, there's IR_ADD, IR_CALL, IR_COPY, IR_IF, IR_LABEL, IR_CROSS, among others. I've described the types of the rest of the fields in my previous parsing post.

Alright, let's get back to _slang_gen_asm(). This function does some sanity checks on the input, and calls slang_find_asm_info(id), which just returns a mapping from the string "vec4_add" used in the source to the actual IR_ADD operator. It then constructs a node using this operator, and calls _slang_gen_operation() for each of the children of the operation, assigning the IR node's children accordingly. The actual IR node creation function is new_node3(), which takes an opcode and three slang_ir_nodes to set as the newly created node's children. There is also a new_node2(), new_node1(), and new_node0() which call new_node3() with NULL as the extra arguments.

There's one more thing that _slang_gen_asm() does. It checks to see if the number of arguments specified in the source is the same as the number of arguments that the assembly command expects (gotten from slang_find_asm_info()). If it isn't, that means that the source doesn't specify the storage for the result of the operation. If the storage isn't specified, a temporary will be allocated later. On the other hand, if storage is specified, we have to set up the Store field of the new IR node. It does this by calling _slang_gen_operation() on the result argument, then taking its Store value and copying it into the new node's Store member.

Alright, let's talk about function calls now. The relevant function is _slang_gen_function_call_name(). The first thing this function does is call _slang_function_locate() with the function name string to try to find the actual slang_function that's being called. _slang_function_locate(), defined in src/mesa/shader/slang/slang_compile_function.c, walks the slang_function_scope, iterating through all the functions in that scope. It matches a function that has the correct name, but also matches argument types by iterating through them and calling slang_type_specifier_compatible()., defined in src/mesa/shader/slang/slang_typeinfo.c. This function has a special case for comparing ints and floats (they are compatible), but then just makes sure that the types are equal. If the types are structs, it calls slang_struct_equal(), which works similarly. If the two types are arrays, it recurses with the inner array type. If no functions are found, _slang_gen_function_call_name() tries to find an appropriate function by looking for a constructor and trying to cast/unroll constructors.

Now, once we've found a function, if the function doesn't have a body, we need to set a flag telling the linker that it needs to link the function body to this call. At this point, we can finally call _slang_gen_function_call()., which tries to inline the function. It calls slang_inline_function_call() which generates a slang_operation representing the function, and then proceeds to try to inline that operation. I'll describe slang_inline_function_call() in a second, but for now we have to know that it generates "SLANG_OPER_RETURN_INLINED" instructions instead of "SLANG_OPER_RETURN" instructions. These have different runtime semantics.

Inlining is tricky. If the only return statement is at the very end of the function, we can just replace the return_inlined statement with a noop and return the operation. However, if execution hits a return in the middle of the function, execution has to bypass the rest of the function. There are two ways to deal with this; we can either use a return flag and wrap the rest of the function in a giant if statement, or we can simply not inline the function. The reason that we try to inline all these functions is that many graphics cards don't have a runtime stack, and so can't properly call and return from functions. If that's the case, we have to use a return flag. The driver can specify at context creation time if we should be using a return flag or not. If we're told not to use a return flag, then we replace the "SLANG_OPER_RETURN_INLINED" nodes to "SLANG_OPER_RETURN" nodes, and change the top-level operation's type to "SLANG_OPER_NON_INLINED_CALL". Otherwise, if we're using a return flag, we call declare_return_flag() to add a new child to the operation, and create a boolean variable called "__notRetFlag".  Then we change the top-level operation's type to "SLANG_OPER_NON_INLINED_CALL" just like we would have before. At the very end, we recursively call slang_gen_operation on the body of the function. This means that we don't translate functions that can't be reached from main().

Alright, now let's take a look at slang_inline_function_call(). The last argument to this function, returnOper, is a pointer to a operation that the return value should fill in. This is so, if we have something like "x = f(a, b)" we can avoid a copy from a temporary into x. If returnOper is NULL, we have to allocate a temporary called __resultTmp, but only if the function returns a value. This is done by creating a new operation with three children: one to declare __resultTmp, one which actually runs the body of the function, and one to specify the output is __resultTmp. That first child has a type of SLANG_OPER_VARIABLE_DECL. The last child has a type of SLANG_OPER_IDENTIFIER.

The next thing we've got to do is deal with function arguments. In particular, values are passed by value, so we have to copy the values into the function's local scope, but only if the parameter isn't const.  We also have to copy output values from their temporaries to their actual storage. We do this by creating an array of substitution information. Each element in the array specifies a variable name to substitute and an operation to use for the substitution, as well as an enum to determine if the argument is an IN or and OUT variable. (It's actually three arrays; the code uses a struct-of-arrays style instead of an array-of-structs style). We can then copy the body of the function into the slang_function->body member. Then we call slang_substitute do run the actual substitution with those three arrays we just built up, which walks the operation tree, making copies of nodes and modifying them to substitute the specified variables for operations.

Now, we generate the copy instructions that are necessary for the input parameters. For each input parameter, we call slang_operation_insert() to insert a dummy operation into the beginning of the stream. We then fill in this operation with SLANG_OPER_VARIABLE_DECL, and create a single child, and call slang_operation_copy() on it, which emits the copy instruction. We also then have to add the variable to the local scope by calling slang_variable_scope_grow() and filling in the new slang_variable. Once we're done with this, we have to add the function's explicit local variables to the local scope, which is done similarly.

Now we deal with the epilogue. We create a label with slang_operation_insert() of type SLANG_OPER_LABEL so that return has somewhere to jump to. Then, similar to the prologue, we go through the COPY_OUT arguments, and insert SLANG_OPER_ASSIGN operations with slang_operation_insert(). The last thing we do is call slang_replace_scope(), defined in src/mesa/shader/slang/slang_compile_operation.c, which walks the operation tree finding operations which target the old scope, and updates them to use the new scope.

Phew! That was a lot to deal with. There's only a little bit more that's relevant: assignments and sequences. Sequences are really straightforward: An input node of type SLANG_OPER_BLOCK_NO_NEW_SCOPE specifies a sequence of instructions. Each of these gets translated, then is passed to new_seq(tree, n). This function creates a binary node of type IR_SEQ, which means that the block gets turned into a linked list of sequenced operations. An input node of type SLANG_OPER_BLOCK_NEW_SCOPE simply creates a new scope, then delegates to SLANG_OPER_BLOCK_NO_NEW_SCOPE.

That leaves assignments, which are not super complicated either. The relevant function is _slang_gen_assignment(). If the destination is a variable, we look up the variable with _slang_variable_locate() and see if its writable. Then, we need to see if our assignment is predicated on the __notRetFlag that I described earlier. If it is, we create a new predication operation for use later. Now, we see if the rvalue of the assignment is a function call; if so, we can use the function return copy optimization I referred to earlier. Otherwise, we check to see if the types are compatible in assignment with a call to _slang_assignment_compatible(), which checks to see if the size of the types match, and if so, checks some special failure cases (assigning from bool to float or int, or assigning between two different structs of different names, etc). Otherwise, it just returns true. Then, _slang_gen_assignment() calls _slang_gen_operation() for the destination of the assignment, and checks to see that the operation has the Store value set, and that it's writable. We then call _slang_gen_operation() for the right side, convert the destination's Store's swizzle to a writemask if possible, then call new_node2(IR_COPY, lhs, rhs). Now, if the predication operation that we created before exists, we create a new_if() operation instead, and use that. Otherwise we just return the copy node.

Phew, that was a whole lot. Alright, now we've got a representation of the program that's fairly close to the assembly that we want to generate in the end. Next will be converting the IR into a stream of instructions that we can actually execute at runtime.

No comments:

Post a Comment