## Tuesday, April 2, 2013

### Precision qualifiers in OpenGL ES Shading Language

I was reading the spec for the OpenGL ES Shading Language the other day. In particular, I was looking at the precision required for lowp, mediump, and highp numbers. (Section 4.5.1)

highp has a magnitude range from (2^-126, 2^127). This means that its exponent can vary between [-126, 126] (since an open bound on 2^127 can be satisfied if the exponent is 126 and all the mantissa bits are 1s). This means that there must be 253 possible values of the exponent. This means that there must be at least 8 bits reserved for the exponent (and there's even space left over for denormalized numbers and infinity/NaN).

In addition, highp has a relative precision of 2^-24, which means that the size of one ULP between 1 and 2 is 2^-24. This means that there are 2^24 ULPs between 1 and 2, and, because 1 and 2 are sequential powers of two, this means that the mantissa must be able to represent 24 bits of precision. If we assume that one of this bits is the implied leading 1 at the beginning of the mantissa, we then have 23 bits of mantissa. The spec also says that the floating pont range also extends down to -2^126, which means that there's a sign bit. If we add up all these bits together, we have 1 sign bit, plus 8 exponent bits, plus 23 mantissa bits, which makes for a regular 32-bit int.

The same analysis can be performed on mediump. The magnitude range is between (2^-14, 2^14), which means that the exponent value must be able to take on any value between [-14, 13], making for 28 possible exponent values, which means there are 5 exponent bits (with room left over for denormalized numbers and infinity/NaN).

The relative precision of a mediump number is 2^-10, which means that there are 10 bits for the mantissa (using the argument stated above). mediump also is required to extend down to -2^14, so a sign bit is necessary. 1 sign bit + 5 exponent bits + 10 mantissa bits = a 16 bit float.

Things get a little more interesting when dealing with lowp. In particular, lowp isn't required to have relative precision, which means that the precision between 0 and 1 is allowed to be the same between as the precision between 1 and 2. This absolute precision is required to be 2^-8, which means that each ULP has a size of 2^-8. The range is between (-2, 2), which means that there are, in total, (2-(-2))/(2^-8) = 2^10 ULPs. Therefore, this "floating point" number can be represented by a 10-bit int. I suppose this would allow integer registers to be used if all that extra precision isn't necessary.