Saturday, April 9, 2016

GPU Text Rendering Overview

There are a few different ways to render text using the GPU. I’ll be discussing a few ways here. All these different ways represent general strategies - they can be mixed and matched. Also, this list may not be comprehensive, but I’ll only discuss the approaches that I’m familiar with. Also, note that I’m only interested in computing coverage here - not color.

First: a little background on text. Glyphs are just sequences of bezier paths (I’m ignoring “sbix” glyphs and things like that). I’m only interested in TrueType / OpenType fonts, so the form of the glyphs are given in the ‘glyf’ table or the ‘CFF ‘ table. The ‘glyf’ table only describes quadratic bezier curves, while the ‘CFF ‘ table can describe cubic bezier curves. At first, this may sound like a small difference, but it turns out that math involving cubic bezier curves is way more complicated than math involving quadratic bezier curves. (For example: finding the intersections of two cubic bezier curves involves finding roots of a 9th order polynomial - something which humanity is currently unable to compute in closed form.) Also, the winding-order is different for the two formats: the ‘glyf’ table encodes paths with a non-zero winding-order, while the ‘CFF ‘ table encodes paths with an even/odd winding-order. Subpaths are expected to intersect. This means that you can’t assume that the right-hand-side of a contour is always inside the glyph.

Texture Atlas


The first approach is kind of a hack. The text is still “rendered” on the CPU (the same way it has been done since the 70s), but the final image is uploaded into a texture atlas on the GPU. This approach actually makes the first use of a glyph slower (because of the additional upload step); however, subsequent uses of that glyph are much faster.

If a subsequent use of a glyph is slightly different in position (within a pixel) or size, there are a couple things you can do. If the position is different (one usage has its origin on the left edge of a pixel, and another usage has its origin in the middle), you might be able to get away with simply relying on the GPU’s texture filtering hardware to interpolate the result. If that isn’t good enough for you, you could snap the glyph origins to a few quanta within a pixel, and consider glyphs which differ in this snapped origin to be unique. This approach works similarly for varying glyph sizes - you can either rely on the texture filtering hardware to scale the glyph, or you could snap to a size quanta. (Or both!)

Signed Distance Field


Valve published a similar approach which they use in the Team Fortress 2 game. Recall how in the previous approach, the value of each texel is coverage of that texel. Valve’s approach uses the same idea, except that the value of each texel is a “signed distance field.” This means that the value of each texel is a signed distance of closest approach to the boundary of the curve (signed because “inside” values are negative). Using this approach causes bilinear filtering to provide higher-quality results (or, put another way, you can achieve comparable results with fewer texels).

A newer approach using signed distance fields has been implemented which uses additional color channels in the GPU texture to achieve higher-fidelity results.

Generated Geometry


The next approach is to generate geometry which matches the contours closely. GPUs can only render triangle geometry, so this means that this approach requires triangulating the input curve. One way to do this is to choose a constant triangle size. However, a better idea is to increase the triangle density in areas of high complexity, and to decrease the triangle density in areas of low complexity.

This means you want to subdivide the bezier curves finely where the curve is sharp, and loosely when the curve is loose. Luckily, the DeCasteljau method already does this! If you subdivide a Bezier curve with equal intervals using that method, the subdivision points will be closer together where the curve is sharp, and vice-versa.

Once you’ve done the subdivision, you can use a “Constrained Delaunay Triangulation” to actually run the triangulation. This is similar to a regular Delaunay triangulation, except that it can guarantee that particular edges are present in the triangulation. This means you can guarantee that no triangle will cross a contour. Therefore, each triangle can be considered to be entirely inside or entirely outside the glyph, and can be shaded accordingly.

Stencil Buffer Approach


If you don't want to run that triangulation, you can use the GPU’s stencil buffer (or equivalent) to calculate coverage instead. The idea is that you use the subdivision points to model the contours as a sequence of line segments. Then, you pick a point (let’s call it P) way off somewhere (it can be arbitrary), and, for every line segment, form a triangle with that line segment and that point P. When you do that, you’ll have lots of overlapping triangles.

You can then set up the stencil buffer to say “increment the texel’s counter if the triangle you’re shading has positive area, and decrement the texel’s counter if the triangle you’re shading has negative area” (where “negative area” and “positive area” refer to shading the “front” or “back” or the triangle, and is determined by if the points are submitted in a clockwise or counter-clockwise direction). If you shade all the triangles like this, all the overlapping triangles cancel out, and you're left with nonzero counters in all the places where the glyph lies. You can then set up the stencil buffer to say “only output a value if the stencil buffer has a nonzero value.” Note that this only works for font files which use a nonzero winding order.

This approach has the obvious performance/speed tradeoff of the subdivision density. The higher the density, the slower the rendering but the better-looking the results are. Also, you want the subdivision density to be proportional (somewhat) to the font size, since you want the subdivision density to be roughly equal in screen-space for all glyph rendered. Unfortunately, it’s difficult to use this approach to get high-quality rendering without super tiny triangles.

Loop-Blinn Method


In the first method, glyph coverage information was represented by a texture. In the third method, glyph coverage information was represented by geometry. Another method (called the Loop Blinn method) can represent glyph coverage information by using mathematical formulas. This method tries to represent a particular contour in a way that can be computed inside a fragment shader.

In particular, in order to draw a contour, you draw a single triangle which encompasses the entire contour. Inside the triangle, you define a scalar field where each point inside the triangle has a scalar value associated with it. You can create this scalar field in such a way that the following attributes hold:

  • Scalar values which are negative are “inside” the contour, and scalar values which are positive are “outside” the contour
  • Calculating the value of a scalar value in the field can be done in closed form by only knowing the relevant point’s location within the triangle, in addition to some interpolated information associated with the vertices of the triangle

This means that, given some vertex attributes, you can run some math in a pixel shader which will tell you if the shaded point is inside or outside the contour. So, for each contour, you consider a single triangle which includes the contour, and you then calculate some magic values to associate with each vertex of the triangle. Then, you shade that triangle with a fairly simple pixel shader which computes a closed-form equation to determine the coverage.

Note that the formulas involved with the Loop-Blinn method are much simpler for quadratic Bezier curves than for cubic Bezier curves. However, the general approach still works for cubic curves - the difference is that the formulas are bigger (and you need to perform an additional classification step).

Also note that this approach still can use the Constrained Delaunay Triangulation, because you still need to generate triangles that lie entirely within the bounds of the glyph. However, there is no need to do the heuristic subdivision like in the previous method; instead, all the triangles of the mesh are created from the control points of the contours themselves.

This means that the quality of the curves is defined by mathematical formulas, which means that it is effectively infinitely scalable. In fact, the information in the glyph contours can be losslessly converted to the information used for the Loop-Blinn method.

Overall, these methods are not monolithic things, and can be used in conjunction with one another. For example, you could use the stencil buffer approach to shade the insides of the glyph, but the Loop-Blinn method to shade the contours themselves (so that you don’t have to do any subdivision). These algorithms represent general approaches, and should be used as the basis for further thought (rather than simply coding them up wholesale).

Antialiasing with each of these methods is a pretty interesting discussion, but I’ll save that for another post.