Wednesday, November 16, 2016

Single Screen GPU Handoff

Over the past few years, a collection of laptops have been released with two graphics cards. The idea is that one is low-power and one is high power. When you want long battery life, you can use the low-power GPU, but when you want high performance, you can use the high-power GPU. However, there is a wrinkle: the laptop only has one screen.

The screen’s contents have to come from somewhere. One way to implement this system would be to daisy-chain the two GPUs, thereby keeping the screen always plugged into the same GPU. In this system, the primary GPU (which the screen is plugged into) would have to be told to give the results of the secondary GPU to the screen.

A different approach is to connect both GPUs in parallel with a switch between them. The system will decide when to flip the switch between each of the GPUs. When the screen is connected to one GPU, the other GPU can be turned off completely.

The question, then, is how this looks to a user application. I’ll be investigating three different scenarios here. Note that I’m not discussing what happens if you drag a window between two different monitors each plugged into a separate card; instead, I’m discussing the specific hardware which allows multiple graphics cards to display to the same monitor.

OpenGL on macOS

On macOS, you can tell which GPU your OpenGL context is running on by running glGetString(GL_VENDOR). When you create your context, you declare whether or not you are capable of using the low-power GPU (the high-power GPU is the default). macOS has the design where if any context requires the high-power GPU, the whole system is flipped to use it. This is observable by using gfxCardStatus. This means that the whole system may switch out from under you while your app is running because of something a completely different app did.

For many apps, this isn’t a problem because macOS will copy your OpenGL resources between the GPUs, which means your app may be able to continue without caring that the switch occurred. This works because the OpenGL context itself survives the switch, but the internal renderer changes. Because the context is still alive, your app can likely continue.

The problem, though, is with OpenGL extensions. Different renderers support different extensions, and app logic may depend on the presence of an extension. On my machine, the high-powered GPU supports both GL_EXT_depth_bounds_test and GL_EXT_texture_mirror_clamp, but the low-powered one doesn’t. Therefore, if an app relies on an extension, and the renderer changes in the middle of operation, the app may malfunction. The way to fix this is to listen to the NSWindowDidChangeScreenNotification in the default NSNotificationCenter. When you receive this notification, re-interrogate the OpenGL context for its supported extensions. Note that switching in both directions may occur - the system switches to the high-power GPU when some other app is launched, and the system switches back when that app is quit.

You only have to do this if you opt-in to running on the low-power GPU, because if you don’t opt in, you will run on the high-power GPU, which means your app will be the app keeping the system on the high-power GPU, which means the system will never switch back while your app is alive.

Metal on macOS

Metal takes a different approach. When you want to create a MTLDevice, you must choose which GPU your device reflects. There is an API call, MTLCopyAllDevices(), which will simply return a list, and you are free to interrogate each device in the list to determine which one you want to run on. In addition, there’s a MTLCreateSystemDefaultDevice() which will simply pick one for you. On my machine, this “default device” isn’t magical - it is simply exactly equal (by pointer equality) to one of the items in the list that MTLCopyAllDevices() returns. On my machine, it returns the high-powered GPU.

However, MTLDevices don’t have the concept of an internal renderer. In fact, even if you cause the system to change the active GPU (using the above approach of making another app create an OpenGL context), your MTLDevice still refers to the same device that it did when you created it.

I was suspicious of this, so I ran a performance test. I created a shader which got 28 fps on the high-powered GPU and 11 fps on the low-powered one. While this program was running on the low-powered GPU, I opened up an OpenGL app which I knew would cause the system to switch to the high-powered GPU, and I saw that the app’s fps didn’t change. Therefore, the Metal device doesn’t migrate to a new GPU when the system switches GPUs.

Another interesting thing I noticed during this experiment was that the Metal app was responsive throughout the entire test. This means that the rendering was being performed on the low-power GPU, but the results were being shown on the high-power GPU. I can only guess that this means that the visual results of the rendering are being copied between GPUs every frame. This would also seem to mean that both GPUs were on at the same time, which seems like it would be bad for battery life.

DirectX 12 on Windows 10

I recently bought a Microsoft Surface Book which has the same kind of setup: one low-power GPU and one high-power GPU. Similarly to Metal, when you create a DirectX 12 context, you have to select which adapter you want to use. IDXGIFactory4::EnumAdapters1() returns a list of adapters, and you are free to interrogate them and choose which one you prefer. However, there is no separate API call to get the default adapter; there is simply a convention that the first device in the list is the one you should be using, and that it is the low-power GPU.

As I stated above, on macOS, switching to the discrete GPU is all-or-nothing - the screen’s signal is either coming from the high-power GPU or the low-power GPU.  I don’t know whether or not this is true on Windows 10 because I don’t know of a way to observe it there.

However, an individual DirectX 12 context won’t migrate between GPUs on Windows 10. This is observable with a similar test as the one described above. Automatic migration occurred on previous versions of Windows, but it doesn’t occur now.

Therefore, the model here is similar to Metal on macOS, so it seems like the visual results of rendering are copied between the two cards, and that both cards are kept on at the same time if there are any contexts executing on the high-power GPU.

However, the Surface Book has an interesting design: the high-power GPU is in the bottom part of the laptop, near the keyboard, and the laptop’s upper (screen) half can separate from the lower half. This means that the high-power GPU can be removed from the system.

Before the machine’s two parts can be separated, the user must press a special button on the keyboard which is more than just a physical switch. It causes software to run which inspects all the contexts on the machine to determine if any app is using the high-powered GPU on the bottom half of the machine. If it is being used by any app, the machine refuses to separate from the base (and shows a pop up asking the user to please quit the app, or presumably just destroy the DirectX context). There is currently no way for the app to react to the button being pressed so that it could destroy its context. Instead, currently, the user must quit the app.

However, it is possible to lose your DirectX context in other ways. For example, if a user connects to your machine via Terminal Services (similar to VNC), the system will switch from a GPU-accelerated environment to a software-rendering environment. To an app, this will look like the call to IDXGISwapChain3::Present() will return DXGI_ERROR_DEVICE_REMOVED or DXGI_ERROR_DEVICE_RESET. Apps should react to this by destroying their device and re-querying the system for the present devices. This sort of thing will also happen when Windows Update updates GPU drivers or when some older Windows versions (before Windows 10) perform a global low-power to high-power (or vice-versa) switch. So, a well-formed app should already be handling the DEVICE_REMOVED error. Unfortunately, this doesn’t help the use case of separating the two pieces of the Surface Book.

Thanks to Frank Olivier for lots of help with this post.

Friday, September 30, 2016

Variation Fonts Demo

Try opening this in a recent Safari nightly build.

The first line shows the text with no variations.
The second line animates the weight.
The third line animations the width.
The fourth line animates both.


Thursday, September 22, 2016

Variable Fonts in CSS Draft

Recently, the CSS Working Group in the W3C resolved to pursue adding support for variable fonts within CSS. A draft has been added to the CSS Fonts Level 4 spec. Your questions and comments are extremely appreciated, and will help shape the future of variation fonts support in CSS! Please add them to either a new CSS GitHub issue, tweet at @Litherum, email to, or use any other means to get in contact with anyone at the CSSWG! Thank you very much!

Here is what CSS would look like using the current draft:

1. Use a preinstalled font with a semibold weight:

<div style="font-weight: 632;">hamburgefonstiv</div>

2. Use a preinstalled font with a semicondensed weight:

<div style='font-stretch: 83.7%;'>hamburgefonstiv</div>

3. Use the "ital" axis to enable italics

// Note: No change! The browser can enable variation italics automatically.
<div style="font-style: italic;">hamburgefonstiv</div>

4. Set the "fancy" axis to 9001:

<div style="
font-variation-settings: 'fncy' 9001;">hamgurgefonstiv</div>

5. Animate the weight and width axes together:

@keyframes zooming {
from {
font-variation-settings: 'wght' 400, 'wdth' 85;

to {
font-variation-settings: 'wght' 800, 'wdth' 105;

<div style="animation-duration: 3s;
animation-name: zooming;">hamburgefonstiv</div>

6. Use a variation font as a web font (without fallback):

@font-face {
// Note that this is identical to what you currently do today!
font-family: "VariationFont";
src: url("VariationFont.otf");

<div style="font-family: 'VariationFont';"> hamburgefonstiv</div>

7. Use a variation font as a web font (with fallback):

@font-face {
font-family: 'FancyFont';
src: url("FancyFont.otf") format("opentype-variations"), url("FancyFont-600.otf") format("opentype");
font-weight: 600;
// Old browsers would fail to parse "615",
// so it would be ignored and 600 remains.
// New browsers would parse it correctly so 615 would win.
// Note that, because of the font selection
// rules, the font-weight descriptor above may
// be sufficient thereby making the font-weight
// descriptor below unnecessary.
font-weight: 615;

#fancy {
font-family: "FancyFont";
font-weight: 600;
font-weight: 615;

<div id="fancy">hamburgefonstiv</div>

8. Use two variations of the same variation font

@font-face {
font-family: "VariationFont";
src: url("VariationFont.otf");
font-weight: 400;

<div style="font-family: VariationFont; font-weight: 300;">hamburgefonstiv</div>

<div style="font-family: VariationFont; font-weight: 700;">hamburgefonstiv</div>

9. Combine two variation fonts together as if they were a single font: one for weights 1-300 and another for weights 301-999:

@font-face {
font-family: "SegmentedVariationFont";
src: url("SegmentedVariationFont-LightWeights.otf");
font-weight: 1;

@font-face {
// There is complication here due to the peculiar nature of the font selection rules.
// Note how this block uses the same source file as the block below.
font-family: "SegmentedVariationFont";
src: url("SegmentedVariationFont-HeavyWeights.otf");
font-weight: 301;

@font-face {
font-family: "SegmentedVariationFont";
src: url("SegmentedVariationFont-HeavyWeights.otf");
font-weight: 999;

Thursday, September 15, 2016

Saturday, September 3, 2016

OpenGL on iOS

The model of OpenGL on iOS is much simpler than that on macOS. In particular, the context creation routine on macOS is older than the concept of OpenGL frame buffers, which is why it is structured the way that it is. Back then, the model was much simpler: the OS gave you a buffer, and you drew stuff into it. If you wanted to render offscreen, you had to ask the OS to give you an offscreen buffer.

That all changed with frame buffer objects. Now, in OpenGL, you can create your own offscreen render targets, render into them, and when you’re done, read from them (either as a texture or into host memory). This means that there is a conceptual divide between that buffer the OS gives you when you create your context, and the frame buffer objects you have created in your own OpenGL code.

On iOS, the OpenGL infrastructure was created with frame buffer objects in mind. Instead of asking the OS to give you a buffer to render into, you instead, ask the OS to assign a backing store to a render buffer (which is part of a framebuffer). Specifically, you do this after the OpenGL context is created. This means that almost all of those creation parameters are now unnecessary, since most of them define the structure of that buffer the OS gives you. Indeed, on iOS, when you create a context, the only thing you specify is which version of OpenGL ES you want to use.

On iOS, the way you render directly to the screen is with CoreAnimation layers. There is a method on EAGLContext, renderbufferStorage:fromDrawable: which connects an EAGLDrawable with a renderbuffer. Currently, CAEAGLLayer is the only class which implements EAGLDrawable, which means you have to draw into a layer in the CoreAnimation layer tree. (You can also draw into an offscreen IOSurface by wrapping a texture around it and using render-to-texture, as detailed in my previous post).

This model is quite different from CAOpenGLLayer, as used on macOS. Here, you can affect the properties of the drawable by setting the drawableProperties property on the EAGLDrawable.

There is a higher-level abstraction: a GLKView, which subclasses UIView. This class has a GLKViewDelegate which provides the drawing operations. It has properties which let you specify the attributes of the drawable. There’s also the associate GLKViewController which subclasses UIViewController, which has its own GLKViewControllerDelegate. This delegate has an update() method, which is called between frames. The idea is that you shouldn’t need to subclass GLKView or GLKViewController, but you should subclass the delegates.

Many iOS devices have retina screens. The programmer has to opt-in to high density screens by setting the contentsScale property of the CAEAGLLayer to whatever UIScreen.nativeScale is set to. If you don’t do this, your view will be stretched and blurry. This also means that you have to take care to update any places where you interact with pixel data directly, like glReadPixels().

iOS devices also support multiple monitors via AirPlay. With AirPlay, an app can render content on to a remote display. However, the model for this is a little different than on macOS: instead of the user dragging a window to another monitor, and the system telling the app about it, the app handles the movement to the external monitor. The system will give you a UIScreenDidConnectNotification / UIScreenDidDisconnectNotification when the user enables AirPlay. Then, you can see that the [UIScreen screens] array has multiple items in it. You can then move a view hierarchy to the external screen by assigning the screen to your UIWindow’s screen property. You can create a new UIWindow by using the regular alloc / initWithFrame constructor and passing in the UIScreen’s bounds. You then set the rootViewController of this new window to whatever you want to show on the external monitor. Therefore, when this occurs, you have the freedom to query the properties of the remote screen (using UIScreen APIs, such as UIScreen.nativeScale) and react accordingly. For example, if you have a retina device but you are moving content to a 1x screen, you can know this by querying the screen at the time you move the window to it.

On macOS, an OpenGL context could have many renderers inside it, with only one being active at a current time. On iOS devices, there is only one GPU, which means there is only one renderer. This means you don’t have to worry about a switch in renderers. This means that the model is much simpler and you don’t have to worry so much about things changing out from under you.

Monday, August 22, 2016

OpenGL on macOS

OpenGL is a specification created by a cross-vendor group, and is designed to work on all (fairly modern) graphics cards. While this sounds obvious, it actually has some interesting implications. It means that nothing platform-specific is inside the OpenGL spec itself. Instead, only the common pieces are inside the spec.

In addition, technically, OpenGL is not a piece of software. OpenGL is a document designed for humans to read. There are many libraries written by many people which claim to implement this spec, but it’s important to realize that these libraries are not OpenGL itself. There can be problems with an individual implementation, and there can be problems with the spec, and those are separate problems.

OpenGL operates inside a “context” which is “current” to a thread. However, the spec doesn’t include any way of interacting with this context directly (like creating it or making it current). This is because each platform has their own way of creating this context. On macOS, this is done with the CGL (Core OpenGL) framework.

Another example of something not existing in the spec is the issue of device memory availability. The OpenGL spec does not list any way to ask the device how much memory is available or used on the device. This is because GPUs can be implemented with many different regions of memory with different performance characteristics. For example, many GPUs have a separate area where constant memory or texture memory lives. On the other hand, an integrated GPU uses main memory, which is shared with regular applications, so the whole concept of available graphics memory doesn’t make a lot of sense. (Also, imagine a theoretical GPU with automatic memory compression.) Indeed, these varied memory architectures are incredibly valuable, and GPU vendors should be able to innovate in this space. If being able to ask for available memory limits were added to the spec, it would either 1) be simple but meaningless on many GPUs with varied memory architectures, or 2) be so generic and nebulous that it would be impossible for a program to make any actionable decisions at runtime. The lack of such an API is actually a success, not an oversight. If you are running on a specific GPU whose memory architecture you understand, perhaps the vendor of that GPU can give you a vendor-specific API to answer these kinds of question in a platform-specific way. However, this API would only work on that specific GPU.

Another example is the idea of “losing” a context. Most operating systems include mechanisms which will cause your OpenGL context to become invalid, or “lost.” Each operating system has its own affordances for why a context may be lost, or how to listen for events which may cause the context to be lost. Similar to context creation, this concept falls squarely in the “platform-dependent” bucket. Therefore, the spec itself just assumes your context is valid, and it is the programmer’s responsibility to make sure that’s true on any specific operating system.

As mentioned above, OpenGL contexts on macOS are interacted with directly by using CGL (in addition to its higher-level NSOpenGL* wrappers). There are a few concepts involved with using CGL:
  • Pixel Formats
  • Renderers
  • Virtual Screens
  • Contexts

A context is the thing you need to run OpenGL functions. In order to create a context, you need to specify a pixel format. This is a configuration of the external resources the context will be able to access. For example, you can say things like “Make a double-buffered color buffer 8 bits-per-channel, with a similar 8-bit depth buffer.” This information needs to be specified on the context itself (and is therefore not in the OpenGL spec because it’s platform-specific) because there is a relationship between what you specify here and the integration with the rest of the machine. For example, you can only successfully create a context with a pixel format that the window server understands, because at the end of the day, the window server needs to composite the output of your OpenGL rendering with the rest of the windows on the system. (This is also the reason why there’s no “present” call in the OpenGL spec - it requires interaction with the platform-specific window server.)

Because the pixel format attributes also act as configuration parameters to the renderer in general, this is also the place where you specify things like which version of OpenGL the context should support (which is necessary because OpenGL deprecated some things) and increasingly moves things from ARB extensions into core. Parameters like this one don’t affect the format of the pixels, per se, but they do affect the selection of the CGL renderer used to implement the OpenGL functions.

A CGL renderer is conceptually similar to a vtable which backs the OpenGL drawing commands. There is a software renderer, as well as a renderer provided by the GPU driver. On a MacBook Pro with both an integrated and discrete GPU, different renderers are used for each one. A renderer can operate on one or more virtual screens, which are conceptually similar to physical screens attached to the machine, but generalized (virtualized) so it is possible to, for example, have a virtual screen that spans across two physical screens. There is a relationship between CGDisplayIDs and OpenGL virtual screens, so it’s possible to map back and forth between them. This means that you can get semantic knowledge of an OpenGL renderer based on existing context in your program. It’s possible to iterate through all the renderers on the system (and their relationships with virtual screens) and then use CGL to query attributes about each renderer.

A CGL context has a set of renderers that it may use for rendering. (This set can have more than one object in it.) The context may decide to migrate from one renderer to another. When this happens, the context the application uses doesn’t change; instead if you query the context for its current renderer, it will just reply with a different answer.

(Side note: it’s possible to create an OpenGL context where you specify exactly one renderer to use with kCGLPFARendererID. If you do this, the renderer won’t change; however, the virtual screen can change if, for example, the user drags the window to a second monitor attached to the same video card.)

Therefore, this causes something of a problem. Inside a single context, the system may decide to switch you to a different renderer, but different renderers have different capabilities. Therefore, if you were relying on the specific capabilities of the current renderer, you may have to change your program logic if the renderer changes. Similarly, even if the renderer doesn’t change, but the virtual screen does change, your program may also need to alter its logic if it was relying on specific traits of the screen. Luckily, if the renderer changes, then the virtual screen will also change (even on a MacBook pro with integrated & discrete GPU switching).

On macOS, the only supported way to show something on the screen is to use Cocoa (NSWindow / NSView, etc.). Therefore, using NSOpenGLView with NSOpenGLContext is a natural fit. The best part of NSOpenGLView is that it provides an “update” method which you can override in a subclass. Cocoa will call this update method any time the view’s format changes. For example, if you drag a window from a 1x screen to a 2x screen, Cocoa will call your “update” method, because you need to be aware that the format changed. Inside the “update” function, you’re supposed to investigate the current state of the world (including the current renderer / format / virtual screen, etc.), figure out what changed, and react accordingly.

This means that using the “update” method on NSOpenGLView is how you support Hi-DPI screens. You also should opt-in to Hi-DPI support using wantsBestResolutionOpenGLSurface. If you don’t do this and you’re using a 2x display, your OpenGL content will be rendered at 1x and then stretched across the relevant portion of the 2x display. You can convert between these logical coordinates and the 2x pixel coordinates by using the convert*ToBacking methods on NSView. By default, this stretching happens so calls like glReadPixels() will still work in the default case even without mapping coordinates to their backing equivalent. (Therefore, if you want to support 2x screens, all your calls which interact with pixels directly, like glReadPixels(), will need to be updated.)

Similarly, NSOpenGLView has a property which supports wide-gamut color: wantsExtendedDynamicRangeOpenGLSurface. There is an explanatory comment next to this property which describes how normally colors are clipped in the 0.0 - 1.0 range, but if you set this boolean, the maximum clipping value may increase to something larger than 1.0 depending on which monitor you’re using. You can query this by asking the NSScreen for its maximumExtendedDynamicRangeColorComponentValue. Similar to before, the update method should be called whenever anything relevant here changes, thereby giving you an opportunity to investigate what changed and react accordingly.

However, if you increase the color gamut (think: boundary threshold color) your numbers are supposed to span, it means that one of two things will happen:
  • You keep the same number of representable values as before, but spread each representable value farther from its neighbors (so that the same number of representable values spans the larger space)
  • You add more representable values to keep the density of representable values the same (or higher!) than before.

The first option sucks because the distance of adjacent representable values are fairly close to the minimum perception threshold in our eyes. Therefore, if you increase the distance between adjacent representable values, these “adjacent” colors actually start looking fairly distinct to us humans. The effect becomes obvious if you look at what should be a smooth gradient, because you see bands of solid color instead of the smooth transition.

The second option sucks because more representable values means more information, which means your numbers have to be held in more bits. More bits means more memory is required.

Usually, the best solution is to pay for the additional memory (either by repurposing the alpha channel bits to be used as the color channel, and going to a 10-bit/10-bit/10-bit/2-bit pixel format, which means you use the same amount of memory, but give up alpha fidelity), or by going to a half float (16-bit) pixel format, which means your memory use doubles (since each channel before was 8-bit and now you’re going to 16-bit). Therefore, if you want to use wide color, you probably want deep color, which means you should be specifying an appropriate deep-color pixel format attribute when you create your OpenGL context. You probably want to specify NSOpenGLPFAColorFloat as well as NSOpenGLPFAColorSize 64. Note that, if you don’t use a floating point pixel format (meaning: you use a regular integral pixel format), you do get additional fidelity, but might not be able to represent values outside of the 0.0 - 1.0 range, depending on how the mapping of the integral units maps to the color space (which I don’t know).

There’s one other interesting piece of interesting tech released in the past few years - A MacBook Pro with two GPUs (one integrated and one discrete) will switch between them based on which apps are running and which contexts have been created across the entire system. This switch occurs for all apps, which means that one app can cause the screen to change for all the existing apps. As mentioned before, this means that the renderer inside your OpenGL context could change at an arbitrary time, which means a well-behaved app should listen for these changes and respond accordingly. However, not all existing apps do this, which means that the switching behavior is entirely opt-in. This means that if any app is running which doesn’t understand this switching behavior, the system will simply pick a GPU (the discrete one) and force the entire system to use it until the app closes (or, if more than one naive app is running, until they all close). Therefore, no switches will occur when these apps are running, and the apps can run in peace. However, keeping the discrete GPU running for a long time is a battery drain, so it’s valuable to teach your apps how to react correctly to a GPU switch.

Unfortunately, I’ve found that Cocoa doesn’t call NSOpenGLView’s “update” method when one of these GPU switches occurs. The switch is modeled in OpenGL as a change of the virtual screen of the OpenGL context. You can listen for a virtual screen change in two possible ways:
  • Add an observer to the default NSNotificationCenter to listen for the NSWindowDidChangeScreenNotification
  • Use CGDisplayRegisterReconfigurationCallback

If you’re rendering to the screen, then using NSNotificationCenter should be okay because you’re using Cocoa anyway (because the only way to render to the screen is by using Cocoa). There’s no way to associate a CGL context directly with an NSView without going through NSOpenGLContext. If you’re not rendering to the screen, then presumably you wouldn’t care which GPU is outputting to the screen.

Inside these callbacks, you can simply read the currentVirtualScreen property on NSOpenGLView (or use CGLGetVirtualScreen() - Cocoa will automatically call the setter when necessary). Once you’ve detected a virtual screen change, you should probably re-render your scene because the contents of your view will be stale.

After you’ve implemented support for switching GPUs, you then have to tell the system that the support exists, so that it won’t take the legacy approach of choosing one GPU for the lifetime of your app. You can do this either by setting NSSupportsAutomaticGraphicsSwitching = YES in your Info.plist inside your app’s bundle, or, if you’re using CGL, you can use the kCGLPFASupportsAutomaticGraphicsSwitching pixel format attribute when you create the context. Luckily, CGLPixelFormatObj and NSOpenGLPixelFormat can be freely converted between (likewise with CGLContextObj and NSOpenGLContext).

Now that you’ve told the system you know how to switch GPUs, the system won’t force us to use the discrete GPU. However, if you naively create an OpenGL context, you will still use the discrete GPU by default. It means, however, you now have the ability to specify that you would prefer the integrated GPU. You do this by specifying that you would like an “offline” renderer (NSOpenGLPFAAllowOfflineRenderers).

So far, I’ve discussed how we go about rendering into an NSView. However, there are a few other rendering destinations that we can render into.

The first is: no rendering destination. This is considered an “offscreen” context. You can create one of these contexts by never setting the context’s view (which NSOpenGLView does for you). One way to do this is to simply create the context with CGL, and then never touch NSOpenGLView.

Why would you want to do this? Because OpenGL commands you run inside an offscreen context still execute. You can use your newly constructed context to create a framebuffer object, and render to an OpenGL renderbuffer. Then, you can read the results out of the render buffer with glReadPixels(). If your goal is rendering a 3D scene, but aren’t interested in outputting it on a screen, this is the way to do it.

Another destination is a CoreAnimation layer. In order to do this, you would use a CAOpenGLLayer or NSOpenGLLayer. The layer owns and creates the OpenGL context and pixel format; however, it does this with input from you. The idea is that you would subclass CAOpenGLLayer/NSOpenGLLayer and override the copyCGLPixelFormatForDisplayMask: method (and/or the copyCGLContextForPixelFormat: method). When CoreAnimation wants to create its context, it will call these methods. By supplying the pixel format method, you can specify that, for example, you want an OpenGL version 4 context rather than a version 2 context. Then, when CoreAnimation wants you to render, it will call a draw method which you should override in your subclass and perform any drawing you prefer. By default, it will only ask you to draw in response to setNeedsDisplay, but you can set the “asynchronous” flag to ask CoreAnimation to continually ask you to draw.

Another destination is an IOSurface. An IOSurface is a buffer which can live in graphics memory which can represent a 2D image. The interesting part of an IOSurface is that it can be shared across process boundaries. If you do that, you have to implement synchronization yourself between the multiple processes. It’s possible to wrap an OpenGL texture around an IOSurface, which means you can render to an IOSurface with render-to-texture. If you create a framebuffer object, create a texture from the IOSurface using CGLTexImageIOSurface2D(), bind the texture to the framebuffer, then render into the framebuffer, the result is that you render into the IOSurface. You can share a handle to the IOSurface by using IOSurfaceCreateXPCObject(). Then, if you manage synchronization yourself, you can have another process read from the IOSurface by locking it with IOSurfaceLock() and getting the pointer to the mapped data with IOSurfaceGetBaseAddressOfPlane(). Alternately, you can set it as the “contents” of an CoreAnimation layer. Or, you could use it in another OpenGL context in the other process.

Saturday, April 9, 2016

GPU Text Rendering Overview

There are a few different ways to render text using the GPU. I’ll be discussing a few ways here. All these different ways represent general strategies - they can be mixed and matched. Also, this list may not be comprehensive, but I’ll only discuss the approaches that I’m familiar with. Also, note that I’m only interested in computing coverage here - not color.

First: a little background on text. Glyphs are just sequences of bezier paths (I’m ignoring “sbix” glyphs and things like that). I’m only interested in TrueType / OpenType fonts, so the form of the glyphs are given in the ‘glyf’ table or the ‘CFF ‘ table. The ‘glyf’ table only describes quadratic bezier curves, while the ‘CFF ‘ table can describe cubic bezier curves. At first, this may sound like a small difference, but it turns out that math involving cubic bezier curves is way more complicated than math involving quadratic bezier curves. (For example: finding the intersections of two cubic bezier curves involves finding roots of a 9th order polynomial - something which humanity is currently unable to compute in closed form.) Also, the winding-order is different for the two formats: the ‘glyf’ table encodes paths with a non-zero winding-order, while the ‘CFF ‘ table encodes paths with an even/odd winding-order. Subpaths are expected to intersect. This means that you can’t assume that the right-hand-side of a contour is always inside the glyph.

Texture Atlas

The first approach is kind of a hack. The text is still “rendered” on the CPU (the same way it has been done since the 70s), but the final image is uploaded into a texture atlas on the GPU. This approach actually makes the first use of a glyph slower (because of the additional upload step); however, subsequent uses of that glyph are much faster.

If a subsequent use of a glyph is slightly different in position (within a pixel) or size, there are a couple things you can do. If the position is different (one usage has its origin on the left edge of a pixel, and another usage has its origin in the middle), you might be able to get away with simply relying on the GPU’s texture filtering hardware to interpolate the result. If that isn’t good enough for you, you could snap the glyph origins to a few quanta within a pixel, and consider glyphs which differ in this snapped origin to be unique. This approach works similarly for varying glyph sizes - you can either rely on the texture filtering hardware to scale the glyph, or you could snap to a size quanta. (Or both!)

Signed Distance Field

Valve published a similar approach which they use in the Team Fortress 2 game. Recall how in the previous approach, the value of each texel is coverage of that texel. Valve’s approach uses the same idea, except that the value of each texel is a “signed distance field.” This means that the value of each texel is a signed distance of closest approach to the boundary of the curve (signed because “inside” values are negative). Using this approach causes bilinear filtering to provide higher-quality results (or, put another way, you can achieve comparable results with fewer texels).

A newer approach using signed distance fields has been implemented which uses additional color channels in the GPU texture to achieve higher-fidelity results.

Generated Geometry

The next approach is to generate geometry which matches the contours closely. GPUs can only render triangle geometry, so this means that this approach requires triangulating the input curve. One way to do this is to choose a constant triangle size. However, a better idea is to increase the triangle density in areas of high complexity, and to decrease the triangle density in areas of low complexity.

This means you want to subdivide the bezier curves finely where the curve is sharp, and loosely when the curve is loose. Luckily, the DeCasteljau method already does this! If you subdivide a Bezier curve with equal intervals using that method, the subdivision points will be closer together where the curve is sharp, and vice-versa.

Once you’ve done the subdivision, you can use a “Constrained Delaunay Triangulation” to actually run the triangulation. This is similar to a regular Delaunay triangulation, except that it can guarantee that particular edges are present in the triangulation. This means you can guarantee that no triangle will cross a contour. Therefore, each triangle can be considered to be entirely inside or entirely outside the glyph, and can be shaded accordingly.

Stencil Buffer Approach

If you don't want to run that triangulation, you can use the GPU’s stencil buffer (or equivalent) to calculate coverage instead. The idea is that you use the subdivision points to model the contours as a sequence of line segments. Then, you pick a point (let’s call it P) way off somewhere (it can be arbitrary), and, for every line segment, form a triangle with that line segment and that point P. When you do that, you’ll have lots of overlapping triangles.

You can then set up the stencil buffer to say “increment the texel’s counter if the triangle you’re shading has positive area, and decrement the texel’s counter if the triangle you’re shading has negative area” (where “negative area” and “positive area” refer to shading the “front” or “back” or the triangle, and is determined by if the points are submitted in a clockwise or counter-clockwise direction). If you shade all the triangles like this, all the overlapping triangles cancel out, and you're left with nonzero counters in all the places where the glyph lies. You can then set up the stencil buffer to say “only output a value if the stencil buffer has a nonzero value.” Note that this only works for font files which use a nonzero winding order.

This approach has the obvious performance/speed tradeoff of the subdivision density. The higher the density, the slower the rendering but the better-looking the results are. Also, you want the subdivision density to be proportional (somewhat) to the font size, since you want the subdivision density to be roughly equal in screen-space for all glyph rendered. Unfortunately, it’s difficult to use this approach to get high-quality rendering without super tiny triangles.

Loop-Blinn Method

In the first method, glyph coverage information was represented by a texture. In the third method, glyph coverage information was represented by geometry. Another method (called the Loop Blinn method) can represent glyph coverage information by using mathematical formulas. This method tries to represent a particular contour in a way that can be computed inside a fragment shader.

In particular, in order to draw a contour, you draw a single triangle which encompasses the entire contour. Inside the triangle, you define a scalar field where each point inside the triangle has a scalar value associated with it. You can create this scalar field in such a way that the following attributes hold:

  • Scalar values which are negative are “inside” the contour, and scalar values which are positive are “outside” the contour
  • Calculating the value of a scalar value in the field can be done in closed form by only knowing the relevant point’s location within the triangle, in addition to some interpolated information associated with the vertices of the triangle

This means that, given some vertex attributes, you can run some math in a pixel shader which will tell you if the shaded point is inside or outside the contour. So, for each contour, you consider a single triangle which includes the contour, and you then calculate some magic values to associate with each vertex of the triangle. Then, you shade that triangle with a fairly simple pixel shader which computes a closed-form equation to determine the coverage.

Note that the formulas involved with the Loop-Blinn method are much simpler for quadratic Bezier curves than for cubic Bezier curves. However, the general approach still works for cubic curves - the difference is that the formulas are bigger (and you need to perform an additional classification step).

Also note that this approach still can use the Constrained Delaunay Triangulation, because you still need to generate triangles that lie entirely within the bounds of the glyph. However, there is no need to do the heuristic subdivision like in the previous method; instead, all the triangles of the mesh are created from the control points of the contours themselves.

This means that the quality of the curves is defined by mathematical formulas, which means that it is effectively infinitely scalable. In fact, the information in the glyph contours can be losslessly converted to the information used for the Loop-Blinn method.

Overall, these methods are not monolithic things, and can be used in conjunction with one another. For example, you could use the stencil buffer approach to shade the insides of the glyph, but the Loop-Blinn method to shade the contours themselves (so that you don’t have to do any subdivision). These algorithms represent general approaches, and should be used as the basis for further thought (rather than simply coding them up wholesale).

Antialiasing with each of these methods is a pretty interesting discussion, but I’ll save that for another post.

Thursday, March 24, 2016

CSS Box Model

Click for higher resolution.

Wednesday, February 3, 2016

Color Blending

Color blending is something which is done all the time. Your computer is doing it right now. Literally. However, it requires a little bit of thought to get it right.

There are four pieces involved when blending:
  • Source color
  • Coverage information (alpha)
  • Destination color
  • A working color space

We can all agree on what coverage is. You simply model each sample as a square, or rectangle, or circle, and decide on how much of that area is covered by the foreground. Therefore, it is fractional and unitless (because it’s a ratio). It can never be greater than 1 or less than 0. It is associated with the foreground because it represents the geometry in the foreground. You may have more than one value per sample (for example, if you are interested in each sub-pixel individually).

The working color space is the color space that our blending computations are performed in. This color space must be a linear color space. This means that, if you have a number which represents once channel of a color, and you double it, its distance from zero must also exactly double.

sRGB is not a linear color space. However, we can come up with a conceptual “linearized sRGB” color space which uses the same primaries as sRGB and same 0-point and 1-point, but uses linear interpolation between 0 and 1. Converting from sRGB into this new color space is simply raising each value to the 2.2 power. (The conversion is actually a little more complicated than that - it uses a peace-wise function - but we’re only discussing the conceptual model here.)

So, the first step is to convert all our colors into this working color space. Then, each blending operation is performed by taking a weighted average of the color primaries’ values, using the coverage information as a weight. Successive blending operations are performed back-to-front.

The formula for this weighted average is:

Source Primary * Source Alpha + Destination Primary * (1 - Source Alpha)

You can see that if the alpha is 0, the result is equal to the destination, and if the alpha is 1, the result is equal to the source. This is simply a linear interpolation between the two.

Now, it turns out that the requirement of rendering items from back-to-front is greatly constraining. It means that if we have a whole bunch of items to blend together, we can’t precompute the result of blending certain items together, and then blend those with a background. This is because the formula above is not associative.

However, a very similar formula is associative:

Source Primary + Destination Primary * (1 - Source Alpha)

The only difference between this formula and the original is the replacement of “Source Primary * Source Alpha” with “Source Primary.”

Well, let’s come up with a new concept, called a “premultiplied color.” This is the same thing as a regular color, except the values in the primaries’ channels have already been multiplied by the alpha of the color. This is possible because the color primaries’ values and the alpha channel have the same lifetime, so we can perform the multiplication at the time when this object is created.

Well, we can see that if we use these objects in the associative formula, we get the same answer as before (because the new “Source Primary” is equal to the old “Source Primary” times the Source Alpha; this multiplication is performed inside the “Premultiplied Color” object). However, we get the benefit of using an associative formula.

Therefore, with premultiplied colors, you can blend in any order. It’s worth noting that using premultiplied colors is not a requirement - if you blend out-of-order with premultiplied colors, you will get the exact same result as if you had blended back-to-front with non-premultiplied colors. This also means that you can start blending out-of-order, but if you notice that you have blended all the deepest items, you can transition to non-premultiplied blending halfway through all your blending operations. The answer will be the same.

By contrast, using a linear working color space is a hard requirement. If you don’t do this, your math will yield values which are meaningless. Once you’re done with all your blending, you usually want to write out the output in a well-known colorspace (such as sRGB), which means you usually have to un-linearize the result just before output.

Because of this, linearization / unlinearization should be the first and last steps. Premultiplication and unpremultiplication should be the second and second-to-last steps (if they are used at all). Premultiplication is optional and you can even unpremultiply halfway through your calculations if some conditions are met.

Note that linearizing / unlinearizing sRGB can have some pretty dramatic results. For example, if you blend pure black and pure white (technically “sRGB black” and “sRGB white”) with 50% alpha, you end up with your (resulting sRGB) primaries having values of 74%, nowhere near the 50% you would get if you performed the same calculation (incorrectly) in the non-linear sRGB space.

Sunday, January 17, 2016

UTF-16 Surrogate Calculator

Codepoint (hex):
High surrogate (hex):
Low surrogate (hex):