## Thursday, May 21, 2015

### Colorspaces

There are many explanations of color spaces, but all the discussions of it I have heard have missed some points which I think are important when thinking about color spaces. In particular, there are many related pieces of a color space which all interact to define a color.

First off, a set of colors exists in the world. Physically, a color is a distribution of light frequencies at varying intensities. You can think of these colors as a “space,” similar to a vector space, but not necessarily linear. The axes in this “space” may be stretched or squished according to arbitrary formulas; however, the axes do span the space (using the mathematical definition of “span”).

A color space defines axes (again: not necessarily linear) through this space. Note that, like a vector space, these axes theoretically extend infinitely, which means that the space of representable colors extends infinitely in all directions. If you only consider these axes, all colors can be represented in all of these sets of axes. Because of this property, if you have two of these sets of axes, you can always represent all points in both of them.

However, color spaces define their own “gamut” which is simply the bounds of the color space. This means that there is a watertight seal around the space of colors which the color space can represent. Because two colorspaces might not have the same gamut, this means that there can be colors that are representable in one that are not representable in the other. (Before we introduced gamut, this was not possible; but now that we have this artificial limit of each space, it is possible). However, gamut is just the outermost bounds; all colors within this bounds are representable.

Now, let’s talk about computers. Computers can’t represent infinitely long decimals. They have to quantize (round) all the numbers they work with to particular representable values. For example, an int cannot hold the value 0.5, and a float cannot hold the value 16777217. Therefore, when computers represent a color in a color space, they can only represent discrete points inside the space, rather than all the points, continuously.

So, there is this problem of how to choose which specific points are representable in a color space. You may want to just divide up the space evenly into little grid squares, but this has a problem. In particular, depending on the color, our eyes are better or worse at determining slight variations in color. This means that, around the part of the gamut where our eyes are really good at distinguishing color, we want to make the grid squares small, so the representable values better match what we can actually see. Conversely, around the parts where our eyes don’t work so well, it is wasteful to have the grid cells be smaller than what we can actually distinguish. In effect, what we want to do is take some of those representable values, and squish them so they are denser around part of the gamut, and less dense around other parts.

So, color spaces, define formulas of how to do this. In particular, if you have a value on a particular axis, they define how far down that axis to travel. This function is often non linear. This formula allows you to connect some arbitrary number with a location in the gamut. The bounds of the gamut are often described in terms of these numbers.

“Gamma” is just the particular function (y = x ^ gamma) that sRGB uses that does this.

Now, if you consider this mathematically, all we are doing is representing the same space with different axes, which, mathematically, is completely possible and lossless. It just means that, in this other coordinate system, the numbers which represent the bounds of the gamut will (might) be different, even though the gamut itself has not changed.

That’s fine, except that when we do math on colors (which we do all the time in pixel shaders), we expect the space to behave like a linear vector space. If we have a color, and we double it, we expect to be twice as far from the origin as we were before. However, this means that the space we do math in has to be linear. So, if we want to do math on colors, we have to stretch and squish the axes in the opposite way, in order to get the squished space back to being linear. However, we are now no longer in the original color space! (Because each color space includes the squishiness characteristics of its axes.)

Therefore, if we have a color in a non-linear colorspace, and we have to do math on it, but at the end have a color in that same color space, we must convert the color into a linear space, do math, then convert it back. We don’t want to keep all colors in a linear color space all the time, because that would mean that we would have to cover the entire gamut with grid cells that are as small as the smallest one in the non-linear color space, which means we need more bits to represent our colors (smaller grid cells means more of them).

It also means that we should use a very high precision number format to do math in, and only convert back to nonlinear when we’re done. Each conversion we do is a little bit lossy. Shaders execute with floating point precision (or half float precision) which is much more precision than a single 8-bit uint (which is typically used to represent colors in sRGB).

Now, it turns out that all this stuff was being implemented when computers all used CRT monitors. Now, it turns out that the voltage response of a CRT monitor is almost exactly the inverse of the squishiness of sRGB. This means, if you feed your sRGB-encoded image directly to a CRT, it will, by virtue of physics of electron beams, just happen to exactly undo the sRGB encoding and display the correct thing. This means that, back then, all images were stored in this form, both because 1) No conversion was needed before feeding it directly to the monitor, and 2) the density of information was in all the right places. This practice continues to this day. (It means that when digital cameras take a picture, they actually perform an encoding into sRGB before saving out to disk). Therefore, if you naively have a pixel shader which samples a photograph without doing any correction, you probably are doing the wrong thing.

## OpenGL

In OpenGL, if you are in a fragment shader and you are reading data from a texture whose contents are in a non-linear color space, you must convert that color you read into a linear color space before you do any math with it. But wait - doesn’t texture filtering count as math? Yes, it does. Doesn’t this mean that I have to convert the colors into a linear color space before texture filtering occurs? Yes, it does. One of the texture formats you can mark a texture with is GL_SRGB8 (sRGB is nonlinear), and when you do this, the hardware will run the linearizing conversion on the data read before filtering (Actually implementations are free to do this conversion after filtering, but they should do it before). Then, throughout the duration of the fragment shader, all your math is done in a linear color space, using floats.

But what about the color you output from you fragment shader? Doesn’t blending count as math? Yes, it does. When you enable GL_FRAMEBUFFER_SRGB, OpenGL lets you use sRGB textures when you create a framebuffer. You can also query whether or not your frame buffer is using sRGB or not using glGetFramebufferAttachmentParameter(). When you set this up, the GPU will convert from your linear colorspace into sRGB, and it will do this such that blending occurs correctly (linearly).

The OpenGL spec includes the exact formulas which are run when you convert into and out of sRGB from a linear colorspace. I’m not sure if the linearized sRGB colorspace has a proper name (but it is definitely not sRGB!)

As alluded to earlier, an alternative would be to keep everything in a linear space all the time, but this means that you need more bits to represent colors. In practice the general wisdom is to use at least 10 bits of information per channel. This means, however, that if you are trying to represent your color in 32 bits, you only have 2 bits left over for alpha (using the GL_RGB10_A2 format). If you need more alpha than that (almost certainly), you will likely need to go up to 16 bits per channel, which means doubling your memory use. At that point, you might as well use half floats so you can get HDR as well (assuming your hardware supports it).

Alpha isn’t a part of color. No color space includes mention of alpha. Alpha is ancillary data that does not live within a color space, the way that color does. We put alpha into the w component of a vec4 out of convenience, not because it is ontologically part of the color. Therefore, when you are converting a color between colorspaces, make sure you don’t touch your alpha component (The GL spec follows this rule when describing how it converts into and out of sRGB).