Litherum: Complex Text Handling in WebKit, Part 2: Fonts

Before we go any further, I’d like to talk a little bit about what information is contained within font files. Ultimately, font files dictate the ground truth for how to lay out a particular run of text. There are two common formats of font files: TrueType fonts and OpenType fonts. TrueType was developed by Apple, and when a licensing agreement between Apple and Microsoft couldn’t be reached, Microsoft built on top of TrueType to make OpenType. Both of these formats are similar and use the same underlying file structure. My experience is with TrueType, so that’s what I’m going to focus on here.

A TrueType font file is made up of a collection of sections, each called a “table.” There is also an directory table (which appears first) which describes the locations and sizes of all of the other tables. Each table is denoted by 4 ascii characters which describe what the table is describing. For example, the table that describes the character-to-glyph mapping has a tag of “cmap” and the table that describes naming information has a tag of “name”. Because unknown tables will simply be ignored by any program that is parsing the file, the format can be extended with different kinds of tables.

One of the duties of a font system is to translate a sequence of characters (read: code points) into a sequence of glyphs. Each glyph is identified by a number (which can later be used to get the information needed to draw the glyph). It is important to note that the set of codepoints that the font knows about is arbitrary and different from font file to font file. In addition, the glyph IDs themselves are arbitrary and different from font file to font file. (e.g. If you ask the font system which glyph should be drawn for the letter ‘a’, you cannot use that glyph ID with another font - it may point to a different glyph entirely!). The code point to glyph mapping is done using the “cmap” table in the font file. The specifics of how the lookup is completed are not really all that complicated, but I’ll let you read [1] if you’re interested.

Okay cool, so now we have a glyph ID for the particular code point that we’re trying to draw. We now have to figure out where in the font file the relevant information regarding this glyph is kept. Each glyph can contain a different number of contours and control points, and glyph information is not padded, so we have to look up where the relevant glyph information is kept. This information is inside the “loca” table. This table is simply an array of offsets, so figuring out where to start looking is very easy.

Alright, now we have an offset, and we now know where to start looking for information about how to draw our glyph. The offset is an offset into the “glyf” table, which describes the contours that make up the outline of the relevant glyph. TrueType has two different notions of glyphs: simple glyphs and compound glyphs. Compound glyphs are simply created by drawing multiple other glyphs with different positions and scales. There might be a compound glyph for “ñ” which is composed of drawing the glyph for “n” and the glyph for “~” on top of each other.

A simple glyph is described by a collection of contours. A contour is described by a sequence of quadratic curves laid out end-to-end. A single quadratic curve is characterized by three points: two endpoints and a single control point which defines the tangents at the endpoints. Because the curves in a single contour are laid out end-to-end, the endpoint of one curve would be a duplicate of the beginning-point of the next curve. Also, if we assume that the entire contour is smooth, meaning that the ending derivative of one curve is the same as the beginning derivative of the next curve, we don’t even need to explicitly state the location of the shared endpoint. We can figure out where the shared endpoint must be given the two control points that straddle it.

Therefore, the way that each contour is described is as a list of points, coupled with a boolean for each point which describes if that point is “on” the contour or “off” the contour. Points that are on the contour are endpoints of a curve, and points that are off the contour are control points. Any arbitrary collection of these (point, on/off boolean) tuples will uniquely define a contour. Note that kinks are allowed because the font author can explicitly specify each of the three points for each quadratic curve (but doesn’t have to). If there is no control point between two on-contour points, a straight line is drawn between the two points. A glyph simply consists of a bunch of these contours.

There’s one last piece we need to know in order to draw a glyph: TrueType dictates that rasterizers use the non-zero winding number rule in order to dictate whether a particular point is inside or outside the glyph. This rule is well-documented, but works like this: From the point you’re trying to determine, draw a ray in any direction. Every time the ray crosses a contour from left-to-right, increment a counter. Every time the ray crosses a contour from right-to-left, decrement the counter. If the counter ends up with a non-zero value, the point is inside the glyph. Note that this rule assumes that glyphs “hold water” meaning that there are no gaps. This means that contours can intersect, and that font authors can cut holds out of closed contours.

It is also worth stating here that all of the coordinates that points are described in this table are integral, meaning that this table can only describe locations on a grid. However, the dimensions of this grid are specified in the “head” (font header) table (sort of. Keep reading). Therefore, different fonts can have different precisions for their contours. It also means that the font system has to scale these coordinates to a pixel-based coordinate system. This scaling is straightforward: Let’s call the user-selected font point size “p”. One point is equal to 1/72 of an inch, so divide p by 72 to get the number of inches the grid should take up, then multiply that by the output device’s pixels per inch (so now we have pixels) and then scale the grid to take up that number of pixels. The TrueType spec defines that this scaling is done in fixed-point with a resolution of 1/64.

It should also be noted that the actual value in the “head” table is “units per Em”. In old days of movable type, the printer would have lots of little squares of metal with raised letters on them. When the printer wanted to print a book, he/she would arrange the little squares to spell out the page of the book, dip the raised letters in ink, then stamp out the page of the book. Therefore, each letter had to fit on one of these little squares. The size of these squares is called an “Em”. On a computer, when you say that you want to you a 12 pt font, one pt equals 1/72 of an inch, so you’re saying to make the EM square 12/72 of an inch big on your screen. The scalar then uses the “units per Em” value to figure out how to scale the glyph. However, nothing is stopping a point on the contour from being greater than the “units per Em” value. This means that the glyph may extend outside of the Em square. This is valid, and even expected, behavior.

There’s one major piece of drawing glyphs that I haven’t discussed. There is a problem when the font size is roughly the size of just a couple pixels on the screen. When that happens, the pixels don’t have enough definition to describe the character properly. The contours described so far describe the platonic ideal of what the glyph should look like, but if the glyph is on a low resolution screen, it may be beneficial to munge about with the control points so that the glyph maps better to the screen. This process is called “font hinting.” Truetype exposes a mechanism to do this, called “glyph instructions.” Yes, that’s right, this is an entire language interpreter that TrueType engines employ to munge about with glyph control points. The font author writes a program in this programming language, and the font runtime executes this program at draw time to move control points around. This programming language is complete with flow control, functions, variables, etc. The only difference between regular programming languages is that the only way to output information is to move control points around - you can’t print output to the console or anything like that. The instruction set for this language is actually quite interesting - there are many domain-specific instructions for making it convenient to move a collection of points along the same vector and things like that. It’s definitely worth taking a look at if you’re interested in domain-specific languages.

Another thing that a font file has to describe is the origin point and advance point for a particular glyph. This is actually fairly straightforward - you render one glyph with respect to its own origin. Then, you “move the pen” to that glyph’s advance point, and start on the next glyph. This is the way that glyphs don’t get bunched up together when you draw them. By convention, the first point in the font file for a glyph is that glyph’s origin, and the last point is it’s advance point.

By the way, the TrueType spec makes no mention of antialiasing, subpixel rendering, or colors. It is under the assumption that a pixel is either 100% on or 100% off, and that pixels are monochrome. That tech is implemented in the rasterizer to make fonts look beautiful. In order to do this correctly, the font system needs to figure out the coverage of a particular square region with regards to the contours. This requires multiple integrals to calculate exactly (which is obviously too slow) so estimations are used instead. Colored glyphs (such as emoji) are rendered using bitmap glyphs, which are completely different.

In this discussion, I’ve described only a few of the tables in TrueType files. However, there are many more. One of the more interesting one is the “bsln” table for describing where various baselines are for the characters (Some languages, like Bengali, use an upper baseline, and some, like Chinese, use a centered baseline). Another interesting one (“lcar”) is due to the fact that ligatures represent multiple characters in a single glyph. Because of this, there might be positions within the glyph that represent logical breaks (so that the caret or selection might appear partway through the glyph). Another (“avar”) is that a font might contain multiple sets of glyph information across various axes (such as font weight). Another (“trak”) describes how to achieve various tracking metrics for glyphs. There are many extension tables that have been added to TrueType, and even more that have been added to OpenType.

[1] https://developer.apple.com/fonts/ttrefman/RM06/Chap6cmap.html

Litherum

Friday, November 29, 2013

Complex Text Handling in WebKit, Part 2: Fonts

No comments:

Post a Comment