Monday, July 31, 2017

Wide and Deep Color in Metal and OpenGL

“Wide Color” and “Deep Color” refer to different things. A color space can be “wide” if it has a gamut that is bigger than sRGB. “Gamut” roughly corresponds to how saturated it is possible to represent a color. The wider the color space is, it is possible to represent more and more saturated colors.

“Deep color” refers to the number of representable values in a particular encoding of a color space. An encoding of a color space is “deep” if it has more than 2^24 representable values.

Consider widening a color space without making it deeper. In this situation, you have the same number of representable colors, but these individual points are being stretched farther apart. Therefore, the density of representable colors decreases. This is a problem because it means that our eyes might be able to distinguish between adjacent colors with a higher granularity than the granularity at which they are represented. This commonly leads to “banding,” where what should be a smooth gradient of color over an area appears to our eyes as having stripes of individual colors.

Consider deepening a color space without making it wider. In this situation, you are squeezing more and more points within the same volume of colors, making the density of these points increase. Now, adjacent points may be so close that our eye may not be able to distinguish them. This results in image quality that isn’t any better, but the amount of information required to store the information is higher, resulting in wasted space.

The trick is to do both at once. Widening the gamut, and increasing the number of representable values within that gamut, keeps the density of points roughly equivalent. More information is required to store the image, and the image looks more vibrant to our eyes.


Originally, OpenGL itself didn’t specify what color space its result pixels are in. At the time it was created, this meant that by default, the results were interpreted as sRGB. However, sRGB is a non-linear color space, which means that math on pixel values is meaningless. Unfortunately, alpha blending is math on pixel values, which meant that, by default, blend operations (and all math done in pixel shaders, unless this math was explicitly fixed by the shader author) was broken.

One solution is to simply make the operating system interpret the pixel results as in “linear sRGB.” Indeed, macOS lets you do this by setting the colorSpace property of an NSWindow or CGLayer. Unfortunately, this doesn’t give good results because these pixel results are in 24-bit color, and all of these representable colors should be (roughly) perceptually equidistant from each other. Our eyes, though, are better at perceiving color differences in low-light, which means that dark colors need a higher density of representable values than bright colors. So, in “linear sRGB,” the density of representable values is constant, so we actually don’t have enough definition for dark colors to look good. Increasing the density of representable values would solve the problem for dark colors, but it would make bright colors waste information. (This extra information would probably cost GPU bandwidth, which would probably be fine for just displaying the image on a monitor, but not all GPUs support rendering to > 24-bit color…)

So the colors in the framebuffer need to be in regular sRGB, not linear sRGB. But this means that blending math is meaningless! OpenGL solved this by creating an extension, EXT_TEXTURE_SRGB (which later got promoted to be part of OpenGL Core), which says “whenever you want to perform blending, read the contents of the sRGB destination color from the framebuffer, convert it to a float, linearize it, perform the blend, delinearize it, convert it back to 24-bit color, and store it to the framebuffer”. This way, the final results are always in sRGB, but the blending is done in linear space. This ugly processing only happens on the framebuffer color, not on the output of the fragment shader, so your fragment shader can assume that everything is in linear space, so any math performed will be meaningful.

The trigger to perform this processing is a special format for the framebuffer (so it’s an opt-in feature). Now, in OpenGL, the default framebuffer is not created by OpenGL. Instead, it is created by the Operating System and handed as-is to OpenGL. This means that you have to tell the OS, not OpenGL, to create a framebuffer with one of these special formats. On iOS, you do this by setting the drawableColorFormat of the GLKView. Note that opting in to sRGB is not orthogonal to using other formats - only certain formats are compatible with the sRGB processing.

On iOS, as far as I can tell, OpenGL does not support wide or deep color (because you can’t tell the OS how to interpret the pixel results of OpenGL like you can on macOS - all OpenGL pixels are assumed to be in sRGB). CAEAGLLayer doesn't have a "colorSpace" property. I can’t find any extended-range formats.


On iOS, Metal supports the same type of sRGB / non-sRGB formats that OpenGL does. You can set the MTKView’s colorPixelFormat to one of the sRGB formats, which has the same effect as is it does in OpenGL. Setting it to a non-sRGB format means that blending is performed as-is, which is broken; however, the sRGB formats perform the correct linearization / delinearization for sRGB.

iOS doesn’t support the same sort of color space annotation that macOS does. In particular, a UIWindow or a CALayer doesn’t have a “colorspace” property. Because of this, all colors are expected to be in sRGB. For non-deep and non-wide color, using the regular sRGB pixel formats is sufficient, and these will clamp to the sRGB gamut (meaning clamped between 0 and 1).

And then wide color came along. As noted earlier, wide color and deep color need to happen together, so they aren’t controllable independently. However, there is a conundrum: Because the programmer can’t annotate a particular layer with what color space the values should be interpreted as, how do you represent colors outside of sRGB? The solution is for the colorspace to be extended to beyond the 0 - 1 range. This way, colors within 0 - 1 are interpreted as sRGB as they always have. However, colors outside that range represent the new wider colors. It’s important to note that, because because the new gamut completely includes sRGB, that values must be able to be negative as well as greater than 1. A completely saturated red in the display’s native color space (which is similar to P3) has negative components for green and blue.

The mechanism for enabling this is similar to OpenGL: you select a new special pixel format. The new pixel formats have “_XR” in their name, for “extended range.” These formats aren’t clamped to 0 - 1. sRGB also applies here; the new extended range pixel formats have sRGB variants, which perform a similar gamma function as they did before in OpenGL. This gamma function is extended (in the natural way) to values greater than 1. For values less than 0, this gamma curve is flipped around to curve pointing downward (this makes it an “odd” function).

Using these new pixel formats causes your colors to go from 8 bits per channel to 10 bits per channel. The new 10 bits per channel colors are now signed (because they can go < 0), which means that there are 4 times as many representable values, and half of them are below 0, so the number of positive representable values doubled. In a non-sRGB variant, the maximum value is just around 2, but in an sRGB variant, the maximum value is greater than 2 because of the gamma curve.

On macOS, there is a way to explicitly tell the system how to interpret the color values in a NSWindow or a CALayer using the colorspace property. This works because there is a secondary pass which will convert the pixels into the color space of the monitor. (Presumably iOS doesn’t have this pass for performance, thereby leading to the restriction on which color spaces a pixel value is represented as.) Therefore, to output colors using P3, simply assign the appropriate color space value to the CALayer you are using with Metal. If you do this, remember that “1.0” doesn’t represent sRGB’s 1.0, instead it represents the most saturated color in the new color space. If you don’t also change your rendering code to compensate for this, your colors will be stretched across the gamut, leading to oversaturated colors and ugly renderings. You can solve this by setting this to the new “Extended sRGB” color space of CGColor, which will cause you to have the same rendering as iOS (and allowing values > 1.0). Note that if you do this, you can’t render to an integer pixel format, because those are clipped at 1.0; instead, you’ll have to render to a floating-point pixel format so that you can have values > 1.0.

So, on iOS, you have one switch which turns on both deep color and wide color, and on macOS, you have two switches, one of which turns on wide color and one of which turns on deep color.