Saturday, November 22, 2014

Complex Text Handling in WebKit, Part 7: Width Calculations

There’s one big piece of complex text handling that I haven’t gone into detail with, and that is how we actually calculate the width of a string of text. Calculating the ascent and descent is pretty straightforward - the font knows what its ascent and descent is. However, calculating widths is more involved.

In order to talk about widths, I first need to talk about fonts. WebKit has a different concept of a font than your operating system does. To WebKit, a font comprises of a sequence of FontData objects, which are used for fallback. When you say, for example, <div style=“font-family: Helvetica, Arial;”>text</div>, “Helvetica” refers to a FontData object and “Arial” refers to a FontData object.

There are two subclasses of FontData: SegmentedFontData and SimpleFontData. SimpleFontData simply represents an operating-system font object (CTFontRef on OS X). SegmentedFontData refers to a collection of SimpleFontData objects, where each SimpleFontData is associated with a unicode range. This is how the unicode-range property in @font-face declarations are implemented. Because a Font is actually a collection of platform fonts, when we want to measure the width of a string using a Font, we’re actually specifying a collection of fonts.

There are actually two text code paths in WebKit: a simple path and a complex path. This is because there are many languages where text layout doesn’t require any complex shaping algorithms. In english, each codepoint has a particular glyph associated with it, and each glyph has an associated advance width, and the glyphs are simply put next to each other one by one. For these cases, we can simply make a map of codepoint to glyph, and glyph to width, and use these maps exclusively when calculating text width. If we use the simple path, there is no need to touch platform APIs at all (except to initially fill the map).

However, some languages, such as Arabic, have complex shaping rules about how to lay out glyphs. In particular, the sequence of glyphs chosen for an Arabic letter might be dependent on the letters surrounding it. For this case, we can’t simply make a cache; we need to invoke the platform’s text shaping API.

Therefore, when we want to measure text (Font::width()), the first thing we have to do is determine which codepath to use. There are a variety of things this function (Font::codePath()) considers, but one thing it does is iterates through all the code points, seeing if any come from these complex languages.

As far as WebKit is concerned, the complex codepath is actually easier to understand, since most of the work is delegated to the platform. In particular, each platform has their own implementation of Font::floatWidthForComplexText(), so I can only talk about OS X’s implementation.

The first thing OS X’s implementation of this function does is determines contiguous ranges within the string that are drawn with the same font. This is where font fallback is handled. For each character in the string, it iterates through all the FontDatas inside the Font, testing if that FontData can render the character. This test is performed by creating a CTLine containing only the single character, then iterating through the constituent CTRuns, asking which specific font the platform would use when drawing that particular CTRun. If any of the CTRun’s fonts aren’t the same as the font we are inspecting, the font we are inspecting cannot be used, and we keep searching down the fallback list.

Given this test, it finds contiguous ranges of characters that can be rendered with the same font, and creates a CTLine around the sequence of characters. It then iterates through that CTLine’s constituent CTRuns, remembering the glyphs, advances, and string indices for each glyph in the CTRun, as well as the FontData which gave rise to this run.

However, we’re not done yet. We first have to adjust all the information that CoreText gave us to take into account things like character spacing, tab width, synthetic bold, rounding, collapsing whitespace, etc. This algorithm is straightforward: just iterate through each of the glyphs in each of the runs we found. While we do this, we accumulate all the character advances.

And voilĂ ! We have our width.

The simple text codepath requires much more WebKit code, but the benefit is that we don’t have to interact with platform APIs as often. In addition, the simple text codepath is all platform independent (except for initially populating the caches, as mentioned earlier). This codepath is centered around the WidthIterator class. The idea for WidthIterator::advanceInternal() is that we walk through the string codepoint by codepoint, asking which FontData the current character should be drawn with. We can then ask the FontData what the relevant glyph and advance is for the character. We have to perform the same fixup here regarding character spacing, tab width, etc. that we had to do in the complex case.

So, how do we know what FontData the current character should be drawn with? Well, we build up a large data structure representing all the fonts on the system. The first thing we do is divide up the set of all codepoints into “pages” that each have a set size of code points (currently 256). A GlyphPage contains a pair of SimpleFontData and Glyph ID for each code point contained within the page (There is also a compaction mechanism if all the codepoints would have the same SimpleFontData).

Then, we build up a tree of these GlyphPages. The nodes in the tree are instances of the GlyphPageTreeNode class, which contains a map for FontData to GlyphPageTreeNode that represents its children, as well as a reference to a GlyphPage, and some other less interesting members.

The first level of the tree (just below the root) is actually special. Each of the nodes in the first level is associated with a page number; if you’re interested in the 5th page (meaning codepoints 5 * 256 to 6 * 256) you simply start at the 5th node in this initial level. These are actually held in a static HashMap.

Now, once you have an initial GlyphPageTreeNode, you progress down the tree (in FontGlyphs::glyphDataAndPageForCharacter()) by supplying a FontData object to the node’s children map. The FontData objects are supplied one by one from the Font object (Remember how I said earlier that Fonts are actually a sequence of FontData objects?). You stop when you have hit a node whose page contains a valid glyph and SimpleFontData for your codepoint. If we need a child that doesn’t exist, we create it and populate it at that point (GlyphPageTreeNode::initializePage()). initializePage() works by either calling into CoreText or CoreGraphics in GlyphPage::fill().

So, once we’ve descended through the tree, we’ve found the SimpleFontData that we should draw our character with. How do we then know what the character’s advance width is? Well, SimpleFontData has a GlyphMetricsMap in it which caches these width values. A GlyphMetricsMap contains a map from an int to a GlyphMetricsPage, which simply contains a constant-sized array of advances. If you ask the GlyphMetricsMap what the width for a particular glyph is, it might return cGlyphSizeUnknown, which means that SimpleFontData must then ask the platform what the advance width is, which it does by either calling into CoreText or CoreGraphics.

After we’ve answered all these questions, flow eventually returns to WidthIterator, which continues looping and accumulating glyph advance widths. When the loop concludes, we’ve got our total width.

Complex Text Handling in WebKit, Part 6: Run Layout

Previously we saw how text got transformed from lines to bidi runs, but we still have farther to go until we know exactly where to place each glyph. We have just created a sequence of bidi runs which represent a line, and we have to transform that into a data structure which represents the layout of each run.

The overarching problem that we are trying to solve is to figure out what things to draw and where to draw them when we want to draw a webpage. The usual way that these declarative scenes are drawn is with a scene graph (however, webpages conform to a tree structure), however, overloading the DOM tree to include this rendering information is a bad fit. Instead, we have separated concerns by creating a render tree which represents the scene graph of a DOM tree. The DOM tree is tasked with representing the structure of a webpage (which is what, for example, JavaScript interacts with) and the render tree is tasked with representing the answer to our original question - what to draw and where to draw it. RenderObjects have styles, but DOM nodes don’t; the CSS cascade defines how to assign every RenderObject its own style.

However, the render tree doesn’t go any deeper than a RenderText representing a text node. The RenderText object itself doesn’t have any layout information inside it about where to actually put each of the lines or runs or glyphs. Instead, the block-level container of the RenderText (a RenderBlockFlow instance) has a separate data structure which represents the layout information of all of its inline children.

For example, if you have <div>Lorem <b>ipsum</b></div>, the div itself is a DOM node, and it has a corresponding RenderObject in the render tree. The RenderObject would be a RenderBlockFlow instance, which would have two children: a RenderText (with the string “Lorem “ inside it) and a RenderInline (representing the <b> . Both of these two classes inherit from RenderObject. Note how there is no <b> specific subclass of RenderInline; the difference here is that the RenderInline has a different style than the RenderBlockFlow (namely, a style with a bold font). The RenderInline would have a RenderText child, with the string “ipsum” in it. However, none of these RenderTexts nor the RenderInline have information regarding layout. Instead, the RenderBlockFlow has a RenderLineBoxList member, which represents the layout information for all of the RenderBlockFlow’s inline children. The RenderLineBoxList is simply a list of InlineFlowBoxes.

There are only a few classes in this class hierarchy, so it makes sense to talk about them. InlineBox is the base class for all objects in the InlineBox tree. InlineFlowBox is a subclass which has pointers to children, so this class is used for all the non-leaves in the tree. RootInlineBox is a subclass of InlineFlowBox which represents the entire line, and has functions for dealing with the entire line. InlineTextBox is a subclass of InlineBox (so it can’t have any children) which represents a run of text with a particular directionality. InlineBox has fields which contain layout information.

Back to the algorithm. We’ve got a BidiRunList, and we want to turn that into an InlineBox tree, which includes populating the layout inside InlineBox (such as (x, y) and (width, height)). RenderBlockFlow::createLineBoxesFromBidiRuns() is responsible for this. We initially call RenderBlockFlow::constructLine() which simply creates the InlineBoxTree, then we lay out the tree in a horizontal pass and a vertical pass.

The horizontal pass occurs in computeInlineDirectionPositionsForLine(). Constructing the InlineBox tree isn’t very difficult; for every run in the BidiRunList, we create the appropriate InlineBox subclass instance (by delegating to the renderer, see createInlineBoxForRenderer()) and walk up the RenderObject tree creating InlineBoxes for our ancestors until we encounter a place to join up to the existing tree. We can walk up the RenderObject tree because the BidiRun has a pointer to the RenderObject for which the BidiRun represents a portion of. We can then set the BidiRun’s InlineBox pointer to be the leaf we just created.

Then comes our horizontal layout pass. This happens in two phases: setting widths for each InlineBox, and then setting positions for each InlineBox. Setting widths for an InlineTextBox mainly involves calling Font::width() on the RenderObject’s relevant substring. The RenderObject has a RenderStyle, which contains the Font to use while measuring the substring’s width. If the InlineBox isn’t an InlineTextBox (for example, an image or something), we can simply ask the renderer for its width. Once each InlineBox has a width, placing them horizontally is easy - simply accumulate the widths and set each InlineBox’s X position to the value of the accumulator. Note that the width pass iterates over BidiRuns, but the placement pass is recursive over the InlineBox tree.

Then comes the vertical pass. The idea of the vertical pass is that we want to align the baseline for all the elements in the line. This is already true within a particular InlineTextBox (due to font fallback, not all the characters in an InlineTextBox have to be drawn with the same font), so we have to align all the InlineBoxes so that the baselines align. The vertical pass occurs in computeBlockDirectionPositionsForLine(), and has two passes: a height pass and a position pass.

The height pass is a fairly straightforward recursive pass over the InlineBox tree. For every child, we call RootInlineBox::ascentAndDescentForBox() with the child. For text boxes, we simply iterates through the specific fonts that actually get used when drawing the text (which we know from calculating the InlineTextBox’s width earlier in the horizontal pass), keeping the minimum and maximum ascents and descents for the fonts. For replaced elements (like images), we simply ask the renderer what its height is. We also include things like line-height here. While we are recursing through all our InlineBoxes, we keep track of the maximum ascent and descent (which end up being the ascent and descent for the line as a whole).

Then, we place boxes vertically by calling placeBoxesInBlockDirection(). This is a recursive pass where we place each box’s Y position based on the box’s ascent compared to the line’s ascent. We also handle things here like ruby and emphasis marks.

Now we’ve got layout information regarding each run. Later when we paint we can simply recurse through the tree, and draw each element at it’s constituent position.

Complex Text Handling in WebKit, Part 5: Bidirectional Processing

In a previous post, I described the first step to text layout: line breaking. A line is characterized by two DOM iterators which represent the beginning DOM location and the ending DOM location of the line. An iterator is simply a pointer to a DOM node and an integral offset within that node (which is only used for text nodes; If the node is an tag or something like that, the offset is 0).

The next phase of line layout is bidirectional processing. Bidi processing is needed, for example, if you have a word from a right-to-left language inside a sentence written in a left-to-right language. In this scenario, we don’t want to naively draw the text character by character, because then the characters in the RTL language will be written backwards, causing the result to be indecipherable. Instead, we want to find substrings inside the line that look like they are in a differently directed language, and process those substrings with the correct directionality.

The situation gets more complicated, however, when you consider the case of a LTR word in a RTL phrase in a LTR sentence. Indeed, these nestings can be arbitrarily deep. The model that bidirectional processing uses to represent this is that each character is assigned a “depth”, and even depths represent LTR contexts and odd depths represent RTL contexts. In the previous example, the LTR word would have a depth of 2, the surrounding phrase would have a depth of 1, and the sentence would have a depth of 0. If the base direction of the line is RTL, you simply start your bidirectional processing at a depth of 1.

Once you have all your depths, you can collect sequences of equal depths and handle these sequences independently. Once you have done this, the depth values themselves cease to be important, and you are left with simply a collection of runs, where each run has uniform directionality. In the above example, there would be 5 runs, each alternating directionality.

What’s more, there are ways to provide operators inside the text stream that provide signals to the bidi algorithm. There are 3 pairs of formatting operators, each with a start code point and and end code point: embeddings, overrides, and isolates. Embeddings allow you to be explicit about pushing and popping additional levels onto the depth stack. Overrides allow you to bypass the bidi algorithm, forcing the directionality of the inner text. Isolates specify that the algorithm should be applied recursively and independently to the inner text. There are specific code points which represent each of these operators, as well as HTML5 DOM elements.

The algorithm that actually comes up with all of these depth values involves tables regarding which directionality any given character is, and rules regarding how to respond when you encounter any particular class of character. The algorithm itself is fairly complicated and has to, for example, deal with characters that do not have strong directionality (like punctuation) and with things like making sure that matching brackets always face each other. Luckily, there is a specification for this algorithm, called “UAX #9” from the Unicode consortium. The fact that it is Unicode who is speccing this makes sense, since the algorithm operates on properties of codepoints, which Unicode has defined, and since all text processing programs (both inside web browsers and out) would need to implement this algorithm.

ICU implements this algorithm; however, its implementation is not suitable for WebKit because WebKit needs to handle arbitrary DOM content in a line, rather than simple text. Because of this, WebKit implements their own version of this algorithm, in constructBidiRunsForSegment(), which uses an InlineBidiResolver to populate a BidiRunList. The algorithm itself is inside BidiResolver::createBidiRunsForLine(), and it loops over its content, calls into ICU on each codepoint, updates internal state, and periodically appending runs to the BidiRunList whenever it has finished a run.

From here on out, all layout is performed in terms of these BidiRuns, so it’s worth looking into the data structure a little bit. A BidiRun is a linked list node, so it’s possible to iterate through all the runs in a line by just calling next() on a run. In addition, it has a RenderObject which represents the renderer that the node is created from (encountering a new renderer causes a run to unconditionally be ended), and an InlineBox pointer which represents layout information which will be filled in later. It also has start and stop indices into the renderer, so you know which part of the renderer’s string this run represents, as well as the bidi level.

That’s it for bidirectional processing. Now that we have our BidiRunList, we can create InlineBoxes for the BidiRuns, and then lay out the InlineBoxes.