Sunday, June 28, 2015

AppKit’s relationship with CoreAnimation

The entirety of this discussion comes from [1]

Before CoreAnimation, AppKit had a particular drawing model. In this drawing model, AppKit has a single screen-sized canvas. When AppKit wants to populate a region of this canvas, it would construct a CoreGraphics drawing context targeting the canvas, and ask each of your views which intersect the dirty region of the canvas to recursively draw using that drawing context. Your app would then would then call functions like drawLine(), drawCircle(), drawText(), etc. on the drawing context, and those commands would be realized on the canvas. The canvas then shows up on the screen. There is a simple callback you can call to mark a region dirty.

CoreAnimation, instead, uses a different paradigm, which involves a scene graph of 2D layers. Each layer can have its own canvas. CoreAnimation supports a similar drawing model as AppKit, where when CoreAnimation wants to populate a region of a layer, it will construct a CoreGraphics drawing context targeting the layer, and then ask either subclasses or the layer’s delegate to draw using that drawing context. Once all the canvases in the scene graph are populated, CoreAnimation takes care of actually updating the screen.

Before CoreAnimation, your window is visually atomic, meaning that any time anything changes, you must redraw everything dependent on that change with CoreGraphics. However, with CoreAnimation, you chop up all your drawing into layers, which are finer-grained than the entire window. Therefore, when a region of a layer changes, only bits of that layer will need to be redrawn (and all the other layers don’t need to be touched).

In addition, layers can all have differing sizes and positions. This means that, if something moves, and it has its own layer, you can just move the layer, and never have to touch CoreGraphics at all!

I’d like to take a minute to describe the pros and cons of layers. In particular, layers can be moved extremely quickly compared to AppKit views because the CoreGraphics content doesn’t have to be repainted. However, it is very common to have overlapping layers, which means you may be paying the memory cost of a pixel many times. If you have tons of overlapping views, you can definitely run out of memory. This means that, if you ask for a layer, you may be denied. You should gracefully handle that.

So that’s pretty cool, but CoreAnimation was invented after AppKit, which means that, if AppKit wants to use CoreAnimation, it has to figure out a way to make the two models work with each other. There are two models for this interaction: layer-backed views and layer-hosted views.

Layer-hosted views are conceptually simpler. A layer-hosted view is a leaf view in the AppKit view hierarchy, but it has its own (possibly complex) CoreAnimation layer tree rooted at it. Therefore, there isn’t much interoperability here; instead, when AppKit encounters this view, it knows to just let CoreAnimation handle it from there.

A layer-backed view has a CoreAnimation layer, but also has child views (which have their own layers). Thus, the AppKit view hierarchy is intact; each view just has this extra property of its backing layer. Now, layers and views have some duplicate functionality: they both have things like position and size and the ability to ask the application to draw, etc. AppKit owns all the layers which back views, and will automatically synchronize all the state changes between the view and the layer. This means if you move the view to a particular place, AppKit will implement that by moving the layer to that place. When the layer needs to be drawn, AppKit will ask the view to draw into the layer’s context. It can do this because AppKit implements the layer’s delegate, so when the layer needs something, it calls up to AppKit.

You opt-in to using a layer by using the NSView property wantsLayer. However, note that the name is “wants.” In particular, because of the memory concerns described above, you may want a layer but not get one. This is okay, though, because there isn’t any reason why your drawRect: method should behave differently depending on if its drawing into a layer or not. If you ask for a layer and don’t get one, it just means that animations will be slower, but your app will continue to behave correctly. Therefore, we have graceful degradation.

AppKit also exposes a getter which returns the layer associated with any view (and may return nil for regular views). This means, however, that you now have access to both pieces of state that AppKit tries to keep synchronized: properties on the view and properties on the layer. Therefore, when you use layer-backed views, there is a set of properties on the layers that you shouldn’t modify (because modifying them will cause CoreAnimation to get out of synch with AppKit).

But it also means that you can set any of the ::other:: properties of the CoreAnimation layer which AppKit isn’t responsible for. This is pretty powerful, as you can set the contents of the layer to an image (with a simply property assignment) or give the layer a border, etc. In fact, you may actually have a layer where the entire visual contents of the layer can be described simply by setting properties on the layer! In this case, your drawRect: function would be empty, which means that there is no reason for CoreAnimation to create a CoreGraphics context for the layer at all.

CoreAnimation actually optimizes for this case by allowing you to implement the displayLayer: method on the layer’s delegate. If you do this, it will be called ::instead:: of drawSublayer:inContext:. Note how displayLayer: doesn’t pass you a CoreGraphics context; instead, you are expected to simply set some properties on the layer which will populate its contents.

AppKit exposes this with the updateLayer: method on NSView. You opt-in to using this method by setting the wantsUpdateLayer property to true. If you do this, then AppKit will cause CoreAnimation to call the layer’s delegate’s displayLayer: method, which AppKit implements by calling the view’s updateLayer: method. This is where you can set any properties on the layer you want (except for the ones AppKit manages). Note that, if you implement this, you should ::also:: implement drawRect for graceful degradation if there isn’t enough memory to create a layer for your view.

Before CoreAnimation, animations in AppKit were implemented by an “animator” proxy object. You would set properties on this object as if it was a view, and the animator would take care of updating the real view over time. However, there was no hardware acceleration, which means that each time the animator would change something on the view, the view would be asked to redraw (with CoreGraphics). However, in CoreAnimation, we would like to draw with CoreGraphics as infrequently as possible. Specifically: when we draw into a layer, the backing store of that layer can be animated much faster than we can draw into the layer. Therefore, it’s valuable for AppKit not to tell our view to paint on each frame of an animation, and instead animate the layer. You can prescribe this behavior with the layerContentsRedrawPolicy property. Note that this degrades gracefully; this property is only consulted if you actually get a layer.

Also, because AppKit owns particular properties of the backing layers, you shouldn’t use CoreAnimation to animate those, as that will cause the states to get out of sync. Instead, you should use the AppKit animator proxy object.

[1] https://developer.apple.com/videos/wwdc/2012/#217

Saturday, June 27, 2015

Surprising CSS Layouts

There are a few properties which all come into play when determining a layout described in CSS. However, these can interact in surprising ways. In this discussion, I’m only going to describe styling block elements. Here are the players:
  • Positioning. Blocks can be “positioned” which means that their position is stated explicitly relative to a “containing block.” You can think of it as the containing block creates a local coordinate system with the origin in its top left, and a positioned descendent states its position by using the “left” “top” “bottom” or “right” properties, which describe locations inside that coordinate system. Note that the containing block for an element is not simply the element’s parent; instead, it is the nearest ancestor with a “position” of “absolute” “relative” or “fixed.”
  • Stacking: You can specify a “z-order” on an element, which describes painting order. However, the z-order of an element is only relevant inside that element’s “stacking context.” From the outside looking in, a stacking context is atomic (flattened), and from the inside looking out, only items inside the stacking context have their painting order sorted. Specifying “z-order” on an element also creates a stacking context on that element (and its descendants belong to this stacking context)
  • Clipping: You can specify “overflow: hidden” on an element, and that will cause the element’s contents to be clipped to the bounds of the element. However, positioned descendants whose containing block is outside the overflow:hidden element won't be clipped.
  • Scrolling: Same thing as clipping, except the element has scrollbars and lets you scroll around to see the overflow. You can trigger scrolling with “overflow: scroll” but note how this is the same property that you would use for clipping so you can’t specify both at once.
  • 3D rendering: You can specify 3D transformations on an element, which specify the transform between the element’s local coordinate space and the coordinate space of the element’s parent. Doing so creates a containing block and a stacking context. It also creates a “3D rendering context” which is conceptually the shared 3D space that all elements in the context live in. If you’re on the outside looking in on a 3D rendering context, the contents of the 3D rendering context all get flattened into your local coordinate system when drawn.
Now that we’ve described the concepts at play here, we can combine them. There are a few combinations I’d like to call out as particularly surprising.
  1. You can have overflow: scroll between an element and its containing block. In the following example, both green and blue boxes’ DOM elements are inside the overflow: scroll element, but the green box is position: absolute and its containing block is outside the overflow: scroll. This means that, even though the green box’s DOM element is within the overflow: scroll, it doesn’t move when you scroll. Try scrolling the grey box to see what I mean.
  2. Overflow: scroll doesn’t create a stacking context, which means that its contents are in the same stacking context as its siblings. This means that some element from outside the overflow: scroll can decide to nestle in between the overflow: scroll’s contents (in the z-dimension). In this example, the red box is a sibling of the overflow: scroll, but the green and blue boxes are descendants of the overflow: scroll. Scroll around the grey box to see what I mean.
  3. Clipping doesn’t follow z-order. This means that you can clip an element in one z-index to an element in a completely different z-index. Therefore, you can have some other external element be displayed between the clipper and the clippee. In this example, the green box is being clipped by the red box. However, the blue box, which is external and a sibling to the red one, can get between the two. Note that all the boxes in this example have width: 100px; height: 100px;.
  4. Every time you specify a 3D transformation, it creates a new 3D rendering context, which causes flattening. This means that if you have nested 3D transformations, each transformation gets flattened into the plan of its parent, all the way up the DOM. There is a way to opt-out of this behavior (to have a shared 3D rendering context) but it requires an extra CSS property, transform-style: preserve-3D. In this example, the blue square is a descendent of the green square, and both have 3D transformations to rotate them around the Y axis. The first example does not have transform-style: preserve-3D, but the second example does. Mouse over (or tap) each of them to see how they are set up.

Sunday, June 21, 2015

3D on the web

Describing 3D on the web seems a little complicated at first, but it’s actually fairly straightforward. The concepts involved actually match existing concepts in CSS quite well. The CSS Transforms spec doesn’t actually add that much complexity.

The first thing I want to mention is that transforms are a completely different concept than either animations or transitions. Animations and transitions simply let you describe CSS values to change over time; transforms are just another few CSS properties you can specify on an element. The fact that these different specs are often used together is coincidental.

Before we start discussing transforms, I’d like to take a look back at what HTML and CSS were like before transforms. There is a document, described as a tree of nodes, usually written in HTML. There are also some style key/value pairs, usually written in CSS, which get applied to particular nodes in the document. When a browser wants to actually show a webpage on screen, it simply runs through the document, telling each node to paint. Therefore, nodes get drawn in document order. Because of this, nodes which occur later in the document appear to be on top of previous nodes. You can see this in the picture below: the blue square appears on top of the green square because it appears later in the document.
<div class="square"
    style="left: 0px;
           top: 0px;
           background-color: green;"></div>
<div class="square"
    style="left: 50px;
           top: 50px;
           background-color: blue;"></div>

So, if you have two nodes, and you want to make one appear on top of the other, you have to simply move one after the other in the document.

But wait - shouldn’t the concept of things being on top of other things be a part of style, instead of the document itself? Think about what we just did: we just modified the document itself, just to change the style of how it’s presented. This is, conceptually, a bad move.

In order to address this, CSS has the concept of z-order. This is a CSS attribute which you can put on a div to change the apparent stacking of the elements. Positive z-order values mean “closer to the user.” Here’s the same example as before, but this time using z-order to change the apparent stacking of the boxes:
<div class="square"
    style="z-index: 2;
           left: 0px;
           top: 0px;
           background-color: green;"></div>
<div class="square"
    style="z-index: 1;
           left: 50px;
           top: 50px;
           background-color: blue;"></div>

When a browser encounters a z-order, it’s important to realize that it isn’t actually moving things in the z-dimension. Instead, it just sorts items by their z-order before drawing them. This isn’t actually 3D; it’s just a reordering.

The reordering only applies to elements which have a z-order specified. If you don’t have a z-order, you’re drawn just like normal, as a part of drawing your closest ancestor who ::does:: have a z-order. Therefore, when you specify z-order on some elements, you’re partitioning the document into chunks, the chunks get sorted against each other, and then drawn in turn.

But what happens if you have two z-ordered nodes, and one is a child of the other? Do you just want to disregard everything except the shallowest z-order declaration? Coming up with a global z-ordering for the entire document would be very difficult to do for complicated pages. Instead, we want some way of making some sort of a namespace for stacking, where we can say that within a namespace, chunks will be sorted, but outside of a namespace, that entire namespace is treated as atomic. CSS does this with the concept of a “stacking context.” Using stacking contexts is a good way to encapsulate parts of a webpage.

In CSS, there are many ways to create stacking contexts[1]. There are two straightforward ways:
  1. The “isolation” CSS property. All it does is create a stacking context on any element it applies to.
  2. Specifying z-order itself.
The fact that a stacking context is created any time you specify z-order means that we will never have nested z-orders in the same stacking context. Therefore, for any given stacking context, it’s trivial to sort chunks, because none of the chunks intersect.

Here’s an example of stacking contexts. Note that, if you look at the raw z-order values, the red square should be on the bottom and the yellow square should be on the top. However, because these two elements are contained within their own stacking context, they only get sorted with regard to each other. Then, the red/yellow combination gets treated atomically with respect to the outer stacking context.
<div class="square"
    style="z-index: 2;
           left: 0px;
           top: 0px;
           background-color: green;"></div>
<div style="z-index: 3; position: relative;">
    <div class="square"
        style="z-index: 1;
               left: 50px;
               top: 50px;
               background-color: red;"></div>
    <div class="square"
        style="z-index: 5;
               left: 100px;
               top: 100px;
               background-color: yellow;"></div>
</div>
<div class="square"
    style="z-index: 4;
           left: 150px;
           top: 150px;
           background-color: blue;"></div>

You can even think about this holistically with the concept of a z-order tree (which WebKit has). The non-leaf nodes in the z-order tree are stacking contexts. The leaf nodes are chunks of the document which can be rendered atomically. When you want to render a webpage, you can do a simple traverse of this z-order tree.

Alright, let’s now talk about transforms. Transforms are a paint-time property, which means they don’t affect the layout of content. (Web browsers use two passes: layout and rendering. Laying out content determines where everything should go, and rendering actually draws it. When layout happens, transforms are ignored. Then, just before we want to paint an element, we factor in transforms at that point.)

There are two kinds of transforms: 2D transforms and 3D transforms. 2D transforms are actually conceptually very simple - when you want to paint something, just paint it somewhere else. We’ve already got a 2D graphics context; we just need to adjust the context’s 2D CTM. No big deal. All 2D drawing libraries support CTMs. However, 3D transforms are a little more complicated.

If you have content inside a 3D transform, that content doesn’t even know it. Therefore, inside a 3D transform, there is a plane where content gets drawn to. This drawing is the same drawing that we do to draw the element normally.
<div style="position: relative;
            perspective: 800px;">
    <div class="inner"
         style="position: absolute;
                transform: rotateY(20deg);
                background-color: green;">
        Content
    </div>
</div>

Content

Outside the transformed content, however, we have to flatten the 3D parts into the frame buffer (eventually to be shown on the screen). This flattening needs to occur whenever we need to draw something with a 3D transform into a frame buffer

So what happens if we have nested transforms? Well, like I said before, content that is in-between the two transformed elements doesn’t know that it’s transformed; it just paints like any normal 2D element. That means, when we go to draw the inner transformed element, it will be flattened into the plane of the outer transformed element - NOT the plane of the root document!

This is the concept of a “3D rendering context,” similar to a stacking context. Here, each time you specify a 3D transform, you are creating a 3D rendering context on that element. Anything that’s drawn as a child of that element gets flattened into the plane of the context. 

You can see that in the following markup. The blue square is a child of the green square, and both have a rotation transform applied to them. You can see that the blue square is being rotated, but we only see it after the rotation is projected onto its parent plane. This projection is the same operation that the green square undergoes to be shown on to the root document (our monitors). (Note that the flashing occurs because the blue and green squares are coplanar, so you're only ever seeing half the blue square at a time)
<div style="position: relative;
            perspective: 800px;">
    <div class="inner"
         style="position: absolute;
                transform: rotateY(20deg);
                background-color: green;">
        <div class="inner"
             style="position: absolute;
                    transform: rotateY(20deg);
                    background-color: blue;">
        </div>
    </div>
</div>

This kind of sucks. Every other place (modeling software, game engines, etc.) that describes a hierarchy of transformations doesn’t project everything into the plane of its parent. The reason why CSS has to do it is because we have to preserve the concept of a “document” which is 2D.

However, all is not lost. The CSS designers thought of this problem, and created another CSS property, transform-style, which gives you more control over which elements belong to which 3D rendering context. In particular, there are 2 values: flat and preserve-3d. Flat specifies the behavior described above. Preserve-3d specifies that this element (and its descendants) should belong to the 3D rendering context of its parent. With this value, no flattening occurs, and your descendants live in the same 3D space as your parent.
<div style="position: relative;
            perspective: 800px;">
    <div class="inner"
         style="position: absolute;
                transform: rotateY(20deg);
                background-color: green;
                transform-style: preserve-3d;">
        <div class="inner"
             style="position: absolute;
                    transform: rotateY(20deg);
                    background-color: blue;">
        </div>
    </div>
</div>

So, if you only want one 3D space, and all your transforms to nest inside it, specify transform-style: preserve-3d on all your nodes except the root one.

[1] https://developer.mozilla.org/en-US/docs/Web/Guide/CSS/Understanding_z_index/The_stacking_context