- Which encoding system used? (The answer is almost always “Unicode”)
- If Unicode, is it encoded with UTF-8, UTF-16, or UTF-32? (The answer is almost never “UTF-32”)
- If UTF-16 or UTF-32, which endianness of the code units is used?
Internally, WebKit uses UTF-16 machine-endianness strings (Though there is a compile-time flag to enable using UTF-8 strings internally). This has one interesting repercussion: in order to keep String::size() fast, the function returns the number of code units in the string, not the number of code points.