Tuesday, December 2, 2014

Design of Mesa 3D Part 10: Intel’s Device Driver

FreeBSD 10.1 is out [1] which means I’ve got a good opportunity to take another look at Mesa. Since I last looked at it, the FreeBSD Ports system has been updated to the latest version of Mesa, version 10.3.3 [2], which is 3 major versions past where I was looking before. Needless to say, much has changed.

Once again, I’d like to discuss the players here.

The first player is libglapi. This is a pretty simple library which contains two thread-local variables: one which holds the current OpenGL dispatch table, and another which holds the current OpenGL context. Mesa can set and get the first one with _glapi_set_dispatch() and _glapi_get_dispatch(). The dispatch table holds a collection of function pointers, and every OpenGL API call gets directed through a dispatch table lookup. Mesa can set and get the currently OpenGL context with _glapi_set_context() and _glapi_get_context(). Inside libglapi, contexts are treated as a void*.

Another player is EGL. EGL is a platform abstraction that encapsulates X11 on FreeBSD (but is portable so it will encapsulate WGL on Windows if you’re there, etc.) EGL has a concept of a display, a rendering surface, and is responsible for making an OpenGL context which uses them. It’s also responsible for making an OpenGL context “current” as well as performing the buffer swap at the end of the frame to present your drawn framebuffer. If you want to create an onscreen surface, you have to pass the relevant function a void* which represents the platform-specific renderable that EGL should draw to.

Because EGL is a platform abstraction, it is driver-based. Drivers in Mesa have a single symbol which is either a function that populates a vtable, or the symbol simply is the vtable itself. Currently, there are only two EGL drivers in Mesa, but only one seems to have an implementation: The DRI2-based one. This driver has two parts: The main part of the driver, and the part that deals with X11 (If your platform doesn’t use X11, there are other pieces that can replace it; for example, there is a Wayland part which can be used instead of the X11 piece). The main part of the driver part knows about the X11 part, but the calls from it to the X11 piece are behind a #ifdef. The X11 part doesn’t know about the main part, but instead fills out a vtable with function pointers for the main part to call.

The X11 part uses XCB [3] to interact with the X server. Because the user of EGL had to already set up their rendering destination, they already have a connection to the X11 server, so this part piggybacks off of that connection. (If you’re rendering to a pbuffer, this part makes its own XCB connection to the X server). Its responsibility is to handle all of the requirements of DRI that interact with the X server. Luckily, there isn’t very much to do there. Look in xcb/dri2.h for a list of all the calls that are necessary. Relevant ones are:
  • xcb_dri2_query_version(): Returns the version of the DRI infrastructure. Mine says 1.3.
  • xcb_dri2_connect(): This returns two important strings:
    • The driver name that Mesa should use to actually drive the hardware. Mine is i965. Mesa turns this into “dri/i965_dri.so” and will dlopen() it.
    • The file to open to send DRM commands to. All DRM commands are implemented as ioctl()s on a file descriptor. This is the path to the file to open to get the file descriptor. Mine is /dev/dri/card0
  • xcb_dri2_authenticate(): This is the access control that DRI uses. Once you’ve opened the DRM device file, you can start sending commands directly to the hardware. However, before you start, you have to send one particular command, drmGetMagic(), which will return an arbitrary number. You then have to supply this number to xcb_dri2_authenticate() so the X server can authenticate you (using the same authentication mechanisms that any X client uses). Once you’ve done this, you can start sending commands to the hardware via the fd you opened previously.
  • xcb_dri2_create_drawable(): No return value. You pass the onscreen window XID and this will make the window’s buffers available to DRM.
  • xcb_dri2_destroy_drawable(): The reverse as above.
  • xcb_dri2_get_buffers(): You pass an array of attachments which represent buffers that the X server knows about and will use for compositing the screen. (For example, XCB_DRI2_ATTACHMENT_BUFFER_BACK_LEFT). xcb_dri2_get_buffers() will return a bunch of information regarding each buffer, including the “pitch” of the buffer (the number of bytes between successive rows, though this is hardware-dependent as buffers can be tiled on Intel cards), the CPP (number of bytes per pixel), some flags, and, most importantly, the buffer “name” which is a number which represents the device-specific ID of the buffer.
  • xcb_dri2_swap_buffers(): Once you’re done with a frame, you have to let the X server know that.
  • There are some others, but I’ve omitted them because they’re not very relevant.
So, the X11 part of the EGL performs that initial handshake with the X server, and populates a driver_name field, which the main part of the EGL driver uses to open the correct .so. This .so represents the OpenGL driver (not the EGL driver!). Drivers export a vtable of function pointers.

Then, the client program starts calling OpenGL calls, which all get redirected through libglapi’s function table. In our case, Mesa implements the GL calls. During most OpenGL function calls, the OpenGL driver doesn’t actually get touched, because most GL calls are simply settings state in the context. For example, glViewport(), glClearColor(), glCreateShader(), and glShaderSource() all just set some state. Even glCompileShader doesn’t actually touch the i965 driver, and Mesa simply performs a compilation to a high-level intermediate representation (“hir”). In i965’s case, the driver only gets touched inside glLinkProgram() which performs the final compilation for the card. Even glGenBuffers() and glBindBuffer() don't actually touch the driver; it’s not until glBufferData() that buffers are actually created on the card.

glClear() is pretty interesting; because I only have a gen4 card, there isn’t a mechanism for quickly clearing the framebuffer. Instead, Mesa has a “meta” api where it implements certain API calls in terms of others. in glClear()’s case, it implements it by drawing two triangles which cover the screen, with a trivial fragment shader.

glDrawArrays() and glDrawElements() obviously both touch the driver. This is where the bulk of the work is done, and where the driver reads all the state that Mesa has been setting up for it.

There’s one more player, though, that I only mentioned briefly: libdrm [4]. This is how the OpenGL driver actually interacts with the GPU. It’s a platform-specific library which includes very low-level primitives for dealing with the GPU, and is implemented entirely by ioctl()s on a file descriptor (which I described how to open earlier). The library has one section for each card it supports, and you (obviously) shouldn’t use the API calls for a card that you do not have. There is one .h file which isn’t in a platform-specific directory (xf86drm.h), but the i965 Mesa driver never seems to call these non-platform-specific functions (except for drmGetMagic(), as described above). Instead, everything seems to come from inside the intel folder.

Once you’re at the level of DRM, the nouns and verbs of OpenGL are no longer relevant, since you’re dealing directly with the card. Indeed, the i965 OpenGL driver even has some nouns which DRM doesn’t know about. DRM has a concept of a buffer on the card, and these buffers are used for everything. It’s straightforward to see how a VBO simply uses a buffer on the card, but a compiled program is also simply is placed inside a buffer on the card. A frame buffer is simply a buffer on the card. A command buffer, which OpenGL has no notion of, but the i965 OpenGL driver does (and calls a batch buffer), is simply a buffer on the card. A texture is simply a buffer on the card. Therefore, the part of DRM that handles buffers is the most important part. The API to this subsystem lives in intel_bufmgr.h, and I’ll list a few of the more interesting calls here:
  • drm_intel_bufmgr_gem_init() creates a drm_intel_bufmgr*, which contains a vtable inside it that the buffer-specific calls go through
  • drm_intel_bufmgr_destroy() the reverse of above
  • drm_intel_bufmgr_set_debug(), once called, will set state which will cause successive functions to dump debug output.
  • drm_intel_bo_alloc() makes a drm_intel_bo*, which represents a buffer on the card. This call allocates the buffer.
  • drm_intel_bo_gem_create_from_name(): you pass in the name you got from xcb_dri2_get_buffers(), and it will return a drm_intel_bo* which represents that buffer. This is how you interact with the framebuffer.
  • drm_intel_bo_reference() / drm_intel_bo_unreference(): Buffer objects are reference-counted
  • drm_intel_bo_map() / drm_intel_bo_unmap(): self-explanatory
  • drm_intel_bo_subdata(): Upload data to the buffer at a particular offset
  • drm_intel_bo_get_subdata(): Download data from the buffer at a particular offset
  • drm_intel_bo_exec() / drm_intel_bo_mrb_exec(): Execute the contents of the buffer. This is what performs the glDrawElements() call.
  • drm_intel_bo_set_tiling() / drm_intel_bo_get_tiling(): Intel cards have hardware support for tiling. This is used when the buffer represents a renderbuffer or texture.
As you can start to see, these function calls are all that are really necessary for implementing the core part of the i965 OpenGL device driver. Uploading a texture is simply a drm_intel_bo_alloc() and a drm_intel_bo_subdata(). Using a shader is simply a CPU-side compilation, then an alloc/subdata to get it on the card. For a command buffer, you can map it, then write commands into the buffer. When you’re done, exec it.

These calls are all passed almost directly to the kernel, meaning that this is about as far down as we can go and still be in userland.

All this information is enough to write a program that uses the GPU directly but doesn’t go through Mesa. Here are the steps:
  1. xcb_connect() to the X server.
  2. xcb_dri2_connect() to get the DRI device file path.
  3. xcb_create_window() to make your window.
  4. xcb_map_window() to show it
  5. At this point, you should wait for an expose event
  6. xcb_dri2_create_drawable() to make it available from DRI
  7. xcb_dri2_get_buffers() with XCB_DRI2_ATTACHMENT_BUFFER_BACK_LEFT to get the name of the buffer that backs the window
  8. open() the DRI device file path
  9. drmGetMagic() to get the magic number which you will use to call…
  10. xcb_dri2_authenticate(). After this point, you can call drm functions.
  11. drm_intel_bufmgr_gem_init() to make an intel-specific bufmgr
  12. drm_intel_bo_gem_create_from_name() with the output of xcb_dri2_get_buffers() that you previously called, to create a intel-specific buffer object which represents the backbuffer of the screen.
  13. drm_intel_bo_map() to map the buffer
  14. use the backbuffer->virtual pointer to write into the frame buffer. You probably should call drm_intel_bo_get_tiling() so you know where to write stuff.
  15. drm_intel_bo_unmap() when you’re done
  16. drm_intel_bo_unreference() when you don’t need the drm_intel_bo anymore
  17. xcb_dri2_swap_buffers() to tell the X server that you’ve finished the frame
  18. drm_intel_bufmgr_destroy() when you’re done with the bufmgr
  19. close() the DRM file descriptor
  20. xcb_dri2_destroy_drawable() when you’re done with DRI2 entirely
  21. xcb_disconnect() from the X server entirely
And there you go. You can see how you might take this structure, but instead of mapping the frame buffer and drawing into it with the CPU, you could allocate a command buffer and write commands into it, and get the GPU to draw into the frame buffer for you.

[1] https://www.freebsd.org/releases/10.1R/announce.html
[2] http://www.mesa3d.org/relnotes/10.3.3.html
[3] http://xcb.freedesktop.org
[4] http://dri.freedesktop.org/wiki/

No comments:

Post a Comment