Litherum: Complex Text Handling in WebKit, Part 5: Bidirectional Processing

In a previous post, I described the first step to text layout: line breaking. A line is characterized by two DOM iterators which represent the beginning DOM location and the ending DOM location of the line. An iterator is simply a pointer to a DOM node and an integral offset within that node (which is only used for text nodes; If the node is an tag or something like that, the offset is 0).

The next phase of line layout is bidirectional processing. Bidi processing is needed, for example, if you have a word from a right-to-left language inside a sentence written in a left-to-right language. In this scenario, we don’t want to naively draw the text character by character, because then the characters in the RTL language will be written backwards, causing the result to be indecipherable. Instead, we want to find substrings inside the line that look like they are in a differently directed language, and process those substrings with the correct directionality.

The situation gets more complicated, however, when you consider the case of a LTR word in a RTL phrase in a LTR sentence. Indeed, these nestings can be arbitrarily deep. The model that bidirectional processing uses to represent this is that each character is assigned a “depth”, and even depths represent LTR contexts and odd depths represent RTL contexts. In the previous example, the LTR word would have a depth of 2, the surrounding phrase would have a depth of 1, and the sentence would have a depth of 0. If the base direction of the line is RTL, you simply start your bidirectional processing at a depth of 1.

Once you have all your depths, you can collect sequences of equal depths and handle these sequences independently. Once you have done this, the depth values themselves cease to be important, and you are left with simply a collection of runs, where each run has uniform directionality. In the above example, there would be 5 runs, each alternating directionality.

What’s more, there are ways to provide operators inside the text stream that provide signals to the bidi algorithm. There are 3 pairs of formatting operators, each with a start code point and and end code point: embeddings, overrides, and isolates. Embeddings allow you to be explicit about pushing and popping additional levels onto the depth stack. Overrides allow you to bypass the bidi algorithm, forcing the directionality of the inner text. Isolates specify that the algorithm should be applied recursively and independently to the inner text. There are specific code points which represent each of these operators, as well as HTML5 DOM elements.

The algorithm that actually comes up with all of these depth values involves tables regarding which directionality any given character is, and rules regarding how to respond when you encounter any particular class of character. The algorithm itself is fairly complicated and has to, for example, deal with characters that do not have strong directionality (like punctuation) and with things like making sure that matching brackets always face each other. Luckily, there is a specification for this algorithm, called “UAX #9” from the Unicode consortium. The fact that it is Unicode who is speccing this makes sense, since the algorithm operates on properties of codepoints, which Unicode has defined, and since all text processing programs (both inside web browsers and out) would need to implement this algorithm.

ICU implements this algorithm; however, its implementation is not suitable for WebKit because WebKit needs to handle arbitrary DOM content in a line, rather than simple text. Because of this, WebKit implements their own version of this algorithm, in constructBidiRunsForSegment(), which uses an InlineBidiResolver to populate a BidiRunList. The algorithm itself is inside BidiResolver::createBidiRunsForLine(), and it loops over its content, calls into ICU on each codepoint, updates internal state, and periodically appending runs to the BidiRunList whenever it has finished a run.

From here on out, all layout is performed in terms of these BidiRuns, so it’s worth looking into the data structure a little bit. A BidiRun is a linked list node, so it’s possible to iterate through all the runs in a line by just calling next() on a run. In addition, it has a RenderObject which represents the renderer that the node is created from (encountering a new renderer causes a run to unconditionally be ended), and an InlineBox pointer which represents layout information which will be filled in later. It also has start and stop indices into the renderer, so you know which part of the renderer’s string this run represents, as well as the bidi level.

That’s it for bidirectional processing. Now that we have our BidiRunList, we can create InlineBoxes for the BidiRuns, and then lay out the InlineBoxes.

Litherum

Saturday, November 22, 2014

Complex Text Handling in WebKit, Part 5: Bidirectional Processing

No comments:

Post a Comment