Friday, April 12, 2013

Design of Mesa 3D Part 3: Dispatch

Previously, I have written about how Mesa 3D's implementation of GLX works; Now, I'd like to transition to OpenGL calls.

First, however, we've got to take a step back and look at one detail that I had glossed over before. We learned in part 2 that one of the important calls that glXMakeCurrent() makes is a call to _mesa_make_current(), defined in src/mesa/glapi/glapi.c. One of the thing that this function does is make a call to _glapi_set_dispatch(newCtx->CurrentDispatch). The argument to this function is a pointer to a _glapi_table struct, which contains one function pointer for each of the OpenGL calls. In particular, the file where this struct is defined, src/mesa/glapi/glapitable.h, is automatically generated from the src/mesa/glapi/gl_table.py script. This script generates uses, as input, src/mesa/glapi/gl_API.xml, which is a hand-constructed XML file containing information about each OpenGL call. There's also an accompanying src/mesa/glapi/glX_API.xml file, as well as a src/mesa/glapi/gl_and_glX_API.xml that includes from both of the previous two.

Anyway, _glapi_set_dispatch() once again, has three implementations: one for TLS, one for threads, and one for no-threads. The threaded implementation calls _glthread_SetTSD() with a key of _gl_DispatchTSD, declared in src/mesa/glapi/glapi.c, which is of type _glthread_TSD. The _glthread_SetTSD() function was described in a previous post, so I'll just post the struct definition here.


typedef struct {
   pthread_key_t  key;
   int initMagic;
} _glthread_TSD;


You can see that the initMagic variable is used to see if the key has been initialized, and the key can be used as an argument to pthread_getspecific(). So now, someone calls a GL function. The actual implementation of all the GL functions is in as assembly file, src/mesa/x86-64/glapi_x86-64.S. For each function, the file has a stanza that looks like this:


.p2align 4,,15
.globl GL_PREFIX(Viewport)
.type GL_PREFIX(Viewport), @function
GL_PREFIX(Viewport):
#if defined(GLX_USE_TLS)
...
#elif defined(PTHREADS)
pushq %rdi
pushq %rsi
pushq %rdx
pushq %rcx
pushq %rbp
call _x86_64_get_dispatch@PLT
popq %rbp
popq %rcx
popq %rdx
popq %rsi
popq %rdi
movq 2440(%rax), %r11
jmp *%r11
#else
...
#endif /* defined(GLX_USE_TLS) */
.size GL_PREFIX(Viewport), .-GL_PREFIX(Viewport)


As you can see, this particular function has three different implementations as well. The posix threaded implementation pushes some registers onto the stack, then calls x86_86_get_dispatch from the PLT. Then, pops those registers back off the stack, moves a function pointer that is at a constant offset of the return value of the previous call into a register, and executes that function. It's important to realize here that the register + memory state at the 'jmp' command is the same as it was when the function being called. Therefore, this is a way to do argument forwarding (which is otherwise impossible to do in C). This makes the argument forwarding code easier to write (it can even be written by a script) and faster at runtime. This function calls another function with the same arguments that it's given. That other function is looked up with the x86_64_get_dispatch function. That leads us to the implementation of that function:


#ifdef GLX_USE_TLS
...
#elif defined(PTHREADS)
.extern _glapi_Dispatch
.extern _gl_DispatchTSD
.extern pthread_getspecific
.p2align 4,,15
_x86_64_get_dispatch:
movq _gl_DispatchTSD(%rip), %rdi
jmp pthread_getspecific@PLT
#elif defined(THREADS)
...
#endif


This shows that this function simply another implementation of _glthread_GetTSD(), where the key here is _gl_DispatchTSD. "But wait," you may be interjecting, "the type of gl_DispatchTSD can't be used as an argument to pthread_getspecific()!" However, because the argument is a pointer, and the first element of a _glthread_TSD is a pthread_key_t, the pointers will be identical. The assembly is just using shorthand for a zero offset. (This code will also break if someone rearranges the elements in the struct.)

The last item in the implementation of the assembly dispatch code is the offset that you jump to from the pointer returned by _x86_64_get_dispatch. This, however, is the offset of the function pointer inside the _glapi_table struct. This can be verified because the function pointers are numbered starting at 1 in src/mesa/glapi/glapitable.h. Each function pointer is 8 bytes (since it's a 64-bit machine), so that means that glViewport's index, 305, times 8 bytes per pointer, equals 2440, which is exactly the offset that is specified by the jmp command. It's also worth noting that because the function doesn't set up a stack at all, it doesn't have to ret.

One more thing: Drivers have to fill in the CurrentDispatch member of the GLcontext struct with a function table before calling _mesa_make_current(). Most drivers get a baseline table by calling _mesa_init_exec_table(), defined in src/mesa/main/api_exec.c. This function initializes all the variables with the default mesa implementations. However, the driver is then able to modify the resulting vtable before calling _mesa_make_current(). This allows drivers to swap out whole functions of the OpenGL API.

So now, we've finally traced to the meat of a default GL function, _mesa_Viewport(), defined in src/mesa/main/viewport.c. This function delegates to a helper, _mesa_set_viewport(), which simply changes some parameters in ctx->Viewport. It then sets the _NEW_VIEWPORT bit in ctx->NewState, so that future (more substantive) functions will be able to react to the change. The last thing it does is interesting: it calls if (ctx->Driver.Viewport) { ctx->Driver.Viewport(...) }. This ctx->Driver is a similar vtable, of type dd_function_table, defined in src/mesa/main/dd.h, which exports a different, mesa-specific API. This class's API seems to be almost exactly the same as OpenGL, except these functions are executed 'in addition to' Mesa's default OpenGL implementation, not 'instead of.' This struct member is usually set by a call to _mesa_init_driver_functions(), defined in mesa/drivers/common/driverfuncs.c, inside the implementation of the driver's context creation function. The device driver is then free to modify the vtable as it pleases. For many of these smaller, simpler functions, the implementation of these driver functions is just NULL.

Here are notes on some other small functions (which all share the dispatch code and all call their relevant driver function after execution):

  • _mesa_GetString(), in src/mesa/main/getstring.c, switches on its argument, and looks at ctx->Extensions to compute its version string.
  • _mesa_ClearColor(), in src/mesa/main/clear.c, simply modifies ctx->Color.ClearColor.
  • _mesa_Enable() and _mesa_EnableClientState(), both in src/mesa/main/enable.c, delegate to _mesa_set_enable() and switch on its argument and sets flags inside ctx.
  • _mesa_BlendFunc(), in mesa/main/blend.c, does a large amount of error checking, and then modifies ctx->Color->Blend*
Alright, that's a good time to stop for now. Next time I'll write about how shaders get compiled.

No comments:

Post a Comment