Monday, August 22, 2016

OpenGL on macOS

OpenGL is a specification created by a cross-vendor group, and is designed to work on all (fairly modern) graphics cards. While this sounds obvious, it actually has some interesting implications. It means that nothing platform-specific is inside the OpenGL spec itself. Instead, only the common pieces are inside the spec.

In addition, technically, OpenGL is not a piece of software. OpenGL is a document designed for humans to read. There are many libraries written by many people which claim to implement this spec, but it’s important to realize that these libraries are not OpenGL itself. There can be problems with an individual implementation, and there can be problems with the spec, and those are separate problems.

OpenGL operates inside a “context” which is “current” to a thread. However, the spec doesn’t include any way of interacting with this context directly (like creating it or making it current). This is because each platform has their own way of creating this context. On macOS, this is done with the CGL (Core OpenGL) framework.

Another example of something not existing in the spec is the issue of device memory availability. The OpenGL spec does not list any way to ask the device how much memory is available or used on the device. This is because GPUs can be implemented with many different regions of memory with different performance characteristics. For example, many GPUs have a separate area where constant memory or texture memory lives. On the other hand, an integrated GPU uses main memory, which is shared with regular applications, so the whole concept of available graphics memory doesn’t make a lot of sense. (Also, imagine a theoretical GPU with automatic memory compression.) Indeed, these varied memory architectures are incredibly valuable, and GPU vendors should be able to innovate in this space. If being able to ask for available memory limits were added to the spec, it would either 1) be simple but meaningless on many GPUs with varied memory architectures, or 2) be so generic and nebulous that it would be impossible for a program to make any actionable decisions at runtime. The lack of such an API is actually a success, not an oversight. If you are running on a specific GPU whose memory architecture you understand, perhaps the vendor of that GPU can give you a vendor-specific API to answer these kinds of question in a platform-specific way. However, this API would only work on that specific GPU.

Another example is the idea of “losing” a context. Most operating systems include mechanisms which will cause your OpenGL context to become invalid, or “lost.” Each operating system has its own affordances for why a context may be lost, or how to listen for events which may cause the context to be lost. Similar to context creation, this concept falls squarely in the “platform-dependent” bucket. Therefore, the spec itself just assumes your context is valid, and it is the programmer’s responsibility to make sure that’s true on any specific operating system.

As mentioned above, OpenGL contexts on macOS are interacted with directly by using CGL (in addition to its higher-level NSOpenGL* wrappers). There are a few concepts involved with using CGL:
  • Pixel Formats
  • Renderers
  • Virtual Screens
  • Contexts

A context is the thing you need to run OpenGL functions. In order to create a context, you need to specify a pixel format. This is a configuration of the external resources the context will be able to access. For example, you can say things like “Make a double-buffered color buffer 8 bits-per-channel, with a similar 8-bit depth buffer.” This information needs to be specified on the context itself (and is therefore not in the OpenGL spec because it’s platform-specific) because there is a relationship between what you specify here and the integration with the rest of the machine. For example, you can only successfully create a context with a pixel format that the window server understands, because at the end of the day, the window server needs to composite the output of your OpenGL rendering with the rest of the windows on the system. (This is also the reason why there’s no “present” call in the OpenGL spec - it requires interaction with the platform-specific window server.)

Because the pixel format attributes also act as configuration parameters to the renderer in general, this is also the place where you specify things like which version of OpenGL the context should support (which is necessary because OpenGL deprecated some things) and increasingly moves things from ARB extensions into core. Parameters like this one don’t affect the format of the pixels, per se, but they do affect the selection of the CGL renderer used to implement the OpenGL functions.

A CGL renderer is conceptually similar to a vtable which backs the OpenGL drawing commands. There is a software renderer, as well as a renderer provided by the GPU driver. On a MacBook Pro with both an integrated and discrete GPU, different renderers are used for each one. A renderer can operate on one or more virtual screens, which are conceptually similar to physical screens attached to the machine, but generalized (virtualized) so it is possible to, for example, have a virtual screen that spans across two physical screens. There is a relationship between CGDisplayIDs and OpenGL virtual screens, so it’s possible to map back and forth between them. This means that you can get semantic knowledge of an OpenGL renderer based on existing context in your program. It’s possible to iterate through all the renderers on the system (and their relationships with virtual screens) and then use CGL to query attributes about each renderer.

A CGL context has a set of renderers that it may use for rendering. (This set can have more than one object in it.) The context may decide to migrate from one renderer to another. When this happens, the context the application uses doesn’t change; instead if you query the context for its current renderer, it will just reply with a different answer.

(Side note: it’s possible to create an OpenGL context where you specify exactly one renderer to use with kCGLPFARendererID. If you do this, the renderer won’t change; however, the virtual screen can change if, for example, the user drags the window to a second monitor attached to the same video card.)

Therefore, this causes something of a problem. Inside a single context, the system may decide to switch you to a different renderer, but different renderers have different capabilities. Therefore, if you were relying on the specific capabilities of the current renderer, you may have to change your program logic if the renderer changes. Similarly, even if the renderer doesn’t change, but the virtual screen does change, your program may also need to alter its logic if it was relying on specific traits of the screen. Luckily, if the renderer changes, then the virtual screen will also change (even on a MacBook pro with integrated & discrete GPU switching).

On macOS, the only supported way to show something on the screen is to use Cocoa (NSWindow / NSView, etc.). Therefore, using NSOpenGLView with NSOpenGLContext is a natural fit. The best part of NSOpenGLView is that it provides an “update” method which you can override in a subclass. Cocoa will call this update method any time the view’s format changes. For example, if you drag a window from a 1x screen to a 2x screen, Cocoa will call your “update” method, because you need to be aware that the format changed. Inside the “update” function, you’re supposed to investigate the current state of the world (including the current renderer / format / virtual screen, etc.), figure out what changed, and react accordingly.

This means that using the “update” method on NSOpenGLView is how you support Hi-DPI screens. You also should opt-in to Hi-DPI support using wantsBestResolutionOpenGLSurface. If you don’t do this and you’re using a 2x display, your OpenGL content will be rendered at 1x and then stretched across the relevant portion of the 2x display. You can convert between these logical coordinates and the 2x pixel coordinates by using the convert*ToBacking methods on NSView. By default, this stretching happens so calls like glReadPixels() will still work in the default case even without mapping coordinates to their backing equivalent. (Therefore, if you want to support 2x screens, all your calls which interact with pixels directly, like glReadPixels(), will need to be updated.)

Similarly, NSOpenGLView has a property which supports wide-gamut color: wantsExtendedDynamicRangeOpenGLSurface. There is an explanatory comment next to this property which describes how normally colors are clipped in the 0.0 - 1.0 range, but if you set this boolean, the maximum clipping value may increase to something larger than 1.0 depending on which monitor you’re using. You can query this by asking the NSScreen for its maximumExtendedDynamicRangeColorComponentValue. Similar to before, the update method should be called whenever anything relevant here changes, thereby giving you an opportunity to investigate what changed and react accordingly.

However, if you increase the color gamut (think: boundary threshold color) your numbers are supposed to span, it means that one of two things will happen:
  • You keep the same number of representable values as before, but spread each representable value farther from its neighbors (so that the same number of representable values spans the larger space)
  • You add more representable values to keep the density of representable values the same (or higher!) than before.

The first option sucks because the distance of adjacent representable values are fairly close to the minimum perception threshold in our eyes. Therefore, if you increase the distance between adjacent representable values, these “adjacent” colors actually start looking fairly distinct to us humans. The effect becomes obvious if you look at what should be a smooth gradient, because you see bands of solid color instead of the smooth transition.

The second option sucks because more representable values means more information, which means your numbers have to be held in more bits. More bits means more memory is required.

Usually, the best solution is to pay for the additional memory (either by repurposing the alpha channel bits to be used as the color channel, and going to a 10-bit/10-bit/10-bit/2-bit pixel format, which means you use the same amount of memory, but give up alpha fidelity), or by going to a half float (16-bit) pixel format, which means your memory use doubles (since each channel before was 8-bit and now you’re going to 16-bit). Therefore, if you want to use wide color, you probably want deep color, which means you should be specifying an appropriate deep-color pixel format attribute when you create your OpenGL context. You probably want to specify NSOpenGLPFAColorFloat as well as NSOpenGLPFAColorSize 64. Note that, if you don’t use a floating point pixel format (meaning: you use a regular integral pixel format), you do get additional fidelity, but might not be able to represent values outside of the 0.0 - 1.0 range, depending on how the mapping of the integral units maps to the color space (which I don’t know).

There’s one other interesting piece of interesting tech released in the past few years - A MacBook Pro with two GPUs (one integrated and one discrete) will switch between them based on which apps are running and which contexts have been created across the entire system. This switch occurs for all apps, which means that one app can cause the screen to change for all the existing apps. As mentioned before, this means that the renderer inside your OpenGL context could change at an arbitrary time, which means a well-behaved app should listen for these changes and respond accordingly. However, not all existing apps do this, which means that the switching behavior is entirely opt-in. This means that if any app is running which doesn’t understand this switching behavior, the system will simply pick a GPU (the discrete one) and force the entire system to use it until the app closes (or, if more than one naive app is running, until they all close). Therefore, no switches will occur when these apps are running, and the apps can run in peace. However, keeping the discrete GPU running for a long time is a battery drain, so it’s valuable to teach your apps how to react correctly to a GPU switch.

Unfortunately, I’ve found that Cocoa doesn’t call NSOpenGLView’s “update” method when one of these GPU switches occurs. The switch is modeled in OpenGL as a change of the virtual screen of the OpenGL context. You can listen for a virtual screen change in two possible ways:
  • Add an observer to the default NSNotificationCenter to listen for the NSWindowDidChangeScreenNotification
  • Use CGDisplayRegisterReconfigurationCallback

If you’re rendering to the screen, then using NSNotificationCenter should be okay because you’re using Cocoa anyway (because the only way to render to the screen is by using Cocoa). There’s no way to associate a CGL context directly with an NSView without going through NSOpenGLContext. If you’re not rendering to the screen, then presumably you wouldn’t care which GPU is outputting to the screen.

Inside these callbacks, you can simply read the currentVirtualScreen property on NSOpenGLView (or use CGLGetVirtualScreen() - Cocoa will automatically call the setter when necessary). Once you’ve detected a virtual screen change, you should probably re-render your scene because the contents of your view will be stale.

After you’ve implemented support for switching GPUs, you then have to tell the system that the support exists, so that it won’t take the legacy approach of choosing one GPU for the lifetime of your app. You can do this either by setting NSSupportsAutomaticGraphicsSwitching = YES in your Info.plist inside your app’s bundle, or, if you’re using CGL, you can use the kCGLPFASupportsAutomaticGraphicsSwitching pixel format attribute when you create the context. Luckily, CGLPixelFormatObj and NSOpenGLPixelFormat can be freely converted between (likewise with CGLContextObj and NSOpenGLContext).

Now that you’ve told the system you know how to switch GPUs, the system won’t force us to use the discrete GPU. However, if you naively create an OpenGL context, you will still use the discrete GPU by default. It means, however, you now have the ability to specify that you would prefer the integrated GPU. You do this by specifying that you would like an “offline” renderer (NSOpenGLPFAAllowOfflineRenderers).

So far, I’ve discussed how we go about rendering into an NSView. However, there are a few other rendering destinations that we can render into.

The first is: no rendering destination. This is considered an “offscreen” context. You can create one of these contexts by never setting the context’s view (which NSOpenGLView does for you). One way to do this is to simply create the context with CGL, and then never touch NSOpenGLView.

Why would you want to do this? Because OpenGL commands you run inside an offscreen context still execute. You can use your newly constructed context to create a framebuffer object, and render to an OpenGL renderbuffer. Then, you can read the results out of the render buffer with glReadPixels(). If your goal is rendering a 3D scene, but aren’t interested in outputting it on a screen, this is the way to do it.

Another destination is a CoreAnimation layer. In order to do this, you would use a CAOpenGLLayer or NSOpenGLLayer. The layer owns and creates the OpenGL context and pixel format; however, it does this with input from you. The idea is that you would subclass CAOpenGLLayer/NSOpenGLLayer and override the copyCGLPixelFormatForDisplayMask: method (and/or the copyCGLContextForPixelFormat: method). When CoreAnimation wants to create its context, it will call these methods. By supplying the pixel format method, you can specify that, for example, you want an OpenGL version 4 context rather than a version 2 context. Then, when CoreAnimation wants you to render, it will call a draw method which you should override in your subclass and perform any drawing you prefer. By default, it will only ask you to draw in response to setNeedsDisplay, but you can set the “asynchronous” flag to ask CoreAnimation to continually ask you to draw.

Another destination is an IOSurface. An IOSurface is a buffer which can live in graphics memory which can represent a 2D image. The interesting part of an IOSurface is that it can be shared across process boundaries. If you do that, you have to implement synchronization yourself between the multiple processes. It’s possible to wrap an OpenGL texture around an IOSurface, which means you can render to an IOSurface with render-to-texture. If you create a framebuffer object, create a texture from the IOSurface using CGLTexImageIOSurface2D(), bind the texture to the framebuffer, then render into the framebuffer, the result is that you render into the IOSurface. You can share a handle to the IOSurface by using IOSurfaceCreateXPCObject(). Then, if you manage synchronization yourself, you can have another process read from the IOSurface by locking it with IOSurfaceLock() and getting the pointer to the mapped data with IOSurfaceGetBaseAddressOfPlane(). Alternately, you can set it as the “contents” of an CoreAnimation layer. Or, you could use it in another OpenGL context in the other process.