tag:blogger.com,1999:blog-87783514384639997962024-03-18T21:43:19.358-07:00LitherumLitherumhttp://www.blogger.com/profile/12738405376090442005noreply@blogger.comBlogger101125tag:blogger.com,1999:blog-8778351438463999796.post-79916681298584373152024-03-18T00:11:00.000-07:002024-03-18T00:11:37.462-07:00So I wrote a double delete...<p>I wrote a double delete. Actually, it was a double autorelease. Here's a fun story describing the path it took to figure out the problem.</p><p>I'm essentially writing a plugin to a different app, and the app I'm plugging-in-to is closed-source. So, I'm making a dylib which gets loaded at runtime. I'm doing this on macOS.</p><h3 style="text-align: left;">The Symptom</h3><p>When running my code, I'm seeing this:</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEjlXfswMQ2x8tcDJr5tOKLn3S8J3Yboo1qa2rCiIMInAxuedYhqxTSPOfbV_vlcxMa_4jlpEV_bviI2VCeTPGtTsFW8SX-xWMPf8O0Wtg1Z7eOxjr7WL1cVEk6AkH5LaYw45gDrc2kkXAKP6GdQ8S6jiWr50wgksN1esGFiND2dEiy5PhHMj0q_WouBKgfX" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="986" data-original-width="2044" height="308" src="https://blogger.googleusercontent.com/img/a/AVvXsEjlXfswMQ2x8tcDJr5tOKLn3S8J3Yboo1qa2rCiIMInAxuedYhqxTSPOfbV_vlcxMa_4jlpEV_bviI2VCeTPGtTsFW8SX-xWMPf8O0Wtg1Z7eOxjr7WL1cVEk6AkH5LaYw45gDrc2kkXAKP6GdQ8S6jiWr50wgksN1esGFiND2dEiy5PhHMj0q_WouBKgfX=w640-h308" width="640" /></a></div><br />Let's see what we can learn from this.<p></p><p>First, it's a crash. We can see we're accessing memory that we shouldn't be accessing.</p><p>Second, it's inside <span style="font-family: courier;">objc_release()</span>. We can use a bit of deductive reasoning here: If the object we're releasing has a positive retain count, then the release shouldn't crash. Therefore, either we're releasing something that isn't an object, or we're releasing something that has a retain count of 0 (meaning: a double release).</p><p>Third, we can actually read a bit of the assembly to understand what's happening. The first two instructions are just a way to check if <span style="font-family: courier;">%rdi</span> is null, and, if so, jump to an address that's later in the function. Therefore, we can deduce that <span style="font-family: courier;">%rdi</span> isn't null.</p><p><span style="font-family: courier;">%rdi</span> is interesting because it's the register that holds the first argument. It's probably a safe assumption to make that <span style="font-family: courier;">objc_release()</span> probably just takes a single argument, and that argument is a pointer, and that pointer is stored in <span style="font-family: courier;">%rdi</span>. This assumption is somewhat-validated by reading the assembly: nothing seems to be using any of the other parameter registers.</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEgQPrbkAUtzqnH2ExSrx_C1rwdoWR9puN2nWSC5QEKI5TSMLlAiX-6DBurzzr4Vs54WQSKWP17QrJzmDkD5guVD4GLzpqr4mMS3BIkEl1KFzplinywNPFm-Ym5scudhr7a_os6CeE-N72uqmnq4o6N6HzsNa18cOC8JYED8AGvjcDZxTy8hXHaUFMlhE9gS" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="1600" data-original-width="2308" height="444" src="https://blogger.googleusercontent.com/img/a/AVvXsEgQPrbkAUtzqnH2ExSrx_C1rwdoWR9puN2nWSC5QEKI5TSMLlAiX-6DBurzzr4Vs54WQSKWP17QrJzmDkD5guVD4GLzpqr4mMS3BIkEl1KFzplinywNPFm-Ym5scudhr7a_os6CeE-N72uqmnq4o6N6HzsNa18cOC8JYED8AGvjcDZxTy8hXHaUFMlhE9gS=w640-h444" width="640" /></a></div><p></p><p>The next 3 lines check if the low bit in <span style="font-family: courier;">%rdi</span> is 1 or not. If it's 1, then we again jump to an address that's later in the function. Therefore, we can deduce that <span style="font-family: courier;">%rdi</span> is an even number (its low bit isn't 1).</p><p>The next 3 lines load a value that <span style="font-family: courier;">%rdi</span> is pointing to, and mask off most of its bits. The next line, which is the line that's crashing, is trying to load the value that the result points to.</p><p>All this makes total sense: Releasing a null pointer should do nothing, and releasing tagged pointers (which I'm assuming are marked by having their low bit set to 1) should do nothing as well. If the argument is an Objective-C object, it looks like we're trying to load the <span style="font-family: courier;">isa</span> pointer, which probably holds something useful at offset <span style="font-family: courier;">0x20</span>. That's the point where we're crashing.</p><p>That leads to the deduction: Either the thing we're trying to release isn't an Objective-C object, or it's already been released, and the release procedure clears (or somehow poisons) the <span style="font-family: courier;">isa</span> value, which caused this crash. Either way, we're releasing something that we shouldn't be releasing.</p><p>One of the really useful observations about the assembly is that nothing before the crash point clobbers the value of <span style="font-family: courier;">%rdi</span>. This means that a pointer to the object that's getting erronously released is *still* in <span style="font-family: courier;">%rdi</span> at the crash site.</p><p>We can also see that the crash is happening inside <span style="font-family: courier;">AutoreleasePool</span>:</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEiMC26gqUah_X3PR98UKUZkPMiAAkbmuHgwkGBKuYjp4Uftot3Gh60SuaZ4u7hq2e0Ej0-Sj5S0ZJ4a1inkr4X0yAHOUsP7f3vzzK0Ix2C-cWSPU8d0FbjUcZ95ZZl8Eylc6yRcFCoLBmY0eWV8fP4GskdSlL5GZJpefOtjzW6sXCQRTE-QGmDIEPdZDRAA" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="794" data-original-width="760" height="640" src="https://blogger.googleusercontent.com/img/a/AVvXsEiMC26gqUah_X3PR98UKUZkPMiAAkbmuHgwkGBKuYjp4Uftot3Gh60SuaZ4u7hq2e0Ej0-Sj5S0ZJ4a1inkr4X0yAHOUsP7f3vzzK0Ix2C-cWSPU8d0FbjUcZ95ZZl8Eylc6yRcFCoLBmY0eWV8fP4GskdSlL5GZJpefOtjzW6sXCQRTE-QGmDIEPdZDRAA=w613-h640" width="613" /></a></div><p></p><p>This doesn't indicate much - just that we're autoreleasing the object instead of releasing it directly. It also means that, because autorelease is delayed, we can't see anything useful in the stack trace. (If we were releasing directly instead of autoreleasing, we could see exactly what caused it in the stack trace.)</p><h3 style="text-align: left;">The First Thing That Didn't Work</h3><p>The most natural solution would be "Let's use Instruments!" It's supposed to have a tool that shows all the retain stacks and release stacks for every object.</p><p>When running with Instruments, we get a nice crash popup showing us that we crashed:</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEj9GcM4-x9r1mk6BpquxQVNApdrjZW9aLut3elx8Rf45SqOZd4-BH3etz_W_h0pYWOp789Ito1-mRkpkLAIl_5VL8GXuLE74uqEVe1otDxnLHAblo6pwfQ_EGX0EXpfjaGda9HPJKXYR1kc7IoX3oyrIW1RKTSs3MXgBn2xt_t3cIWMvETX7K3xdxViU6gq" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="1752" data-original-width="1990" height="563" src="https://blogger.googleusercontent.com/img/a/AVvXsEj9GcM4-x9r1mk6BpquxQVNApdrjZW9aLut3elx8Rf45SqOZd4-BH3etz_W_h0pYWOp789Ito1-mRkpkLAIl_5VL8GXuLE74uqEVe1otDxnLHAblo6pwfQ_EGX0EXpfjaGda9HPJKXYR1kc7IoX3oyrIW1RKTSs3MXgBn2xt_t3cIWMvETX7K3xdxViU6gq=w640-h563" width="640" /></a></div><p></p><p>The coolest part about this is that it shows us the register state at the crash site, which gives us <span style="font-family: courier;">%rdi</span>, the pointer to the object getting erroneously released.</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEgVMu0VX1Lt0ons2_Y66o03br0_pv6w0TRvupZEUYBSUWcqAl56-4YjT-KhH06gStyQdXrcWBBQtfQoyjA9O5RQ0BZVo4idx2J9KSGmDDN9u2DDoEnej1gncl0scyzFQca1_iw2rSbKjQ3lD5iepM4_jPPdAAJAD4k0YawnZNjjuQz0j2mNSfzACAHu54RB" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="1752" data-original-width="1990" height="563" src="https://blogger.googleusercontent.com/img/a/AVvXsEgVMu0VX1Lt0ons2_Y66o03br0_pv6w0TRvupZEUYBSUWcqAl56-4YjT-KhH06gStyQdXrcWBBQtfQoyjA9O5RQ0BZVo4idx2J9KSGmDDN9u2DDoEnej1gncl0scyzFQca1_iw2rSbKjQ3lD5iepM4_jPPdAAJAD4k0YawnZNjjuQz0j2mNSfzACAHu54RB=w640-h563" width="640" /></a></div><p></p><p>Cool, so the object which is getting erroneously released is at <span style="font-family: courier;">0x600002a87b40</span>. Let's see what instruments lists for that address:</p><p></p><div class="separator" style="clear: both; text-align: center;"><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEjxK7f2e4PQbmTfM67RVO9clkuN9N_DYWxw40y2H2L3RwCbJdCgSvVa3mpjbUVelnu1wwSrySNv6HGLEMzTqnjdStitzV-99XjlKJr-EvfJ0FpoTobEzIt_Mgx3ifVtsds0mwuawc-Ryf5iryvjkJFJS_awSZaBCDb1W8EF8t6uMB0mkSxrNrcIS1Ixj3Ld" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="1602" data-original-width="2024" height="507" src="https://blogger.googleusercontent.com/img/a/AVvXsEjxK7f2e4PQbmTfM67RVO9clkuN9N_DYWxw40y2H2L3RwCbJdCgSvVa3mpjbUVelnu1wwSrySNv6HGLEMzTqnjdStitzV-99XjlKJr-EvfJ0FpoTobEzIt_Mgx3ifVtsds0mwuawc-Ryf5iryvjkJFJS_awSZaBCDb1W8EF8t6uMB0mkSxrNrcIS1Ixj3Ld=w640-h507" width="640" /></a></div></div><p></p><p>Well, it didn't list anything for that address. It listed something for an address just before it, and just after it, but not what we were looking for. Thanks for nothing, Instruments.</p><h3 style="text-align: left;">The Second Thing That Didn't Work</h3><p>Well, I'm allocating and destroying objects in my own code. Why don't I try to add logging all my own objects to see where they all get retained and released! Hopefully, by cross referencing the address of the object that gets erroneously deleted with the logging of the locations of my own objects, I'll be able to tell what's going wrong.</p><p>We can do this by overriding the <span style="font-family: courier;">-[NSObject release]</span> and <span style="font-family: courier;">-[NSObject retain]</span> calls:</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEjQkmWWlK873Ym8EZ2Ip12m0aNTvTeH5VBqoz6Y9PAwLysNvF3Fda1srEn6EmJNw_hijiQZ040DdP-CNpibg-fxffCCIZO5Mw98Sf36rB1TvGKCJJSjR2UzULYp8mvOfs7CH_rc6JQqRL1_6NxdtrRj2g1F6FJHtolOgqwkzlsHEny-lCeU0PR073MCiLf5" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="472" data-original-width="1632" height="186" src="https://blogger.googleusercontent.com/img/a/AVvXsEjQkmWWlK873Ym8EZ2Ip12m0aNTvTeH5VBqoz6Y9PAwLysNvF3Fda1srEn6EmJNw_hijiQZ040DdP-CNpibg-fxffCCIZO5Mw98Sf36rB1TvGKCJJSjR2UzULYp8mvOfs7CH_rc6JQqRL1_6NxdtrRj2g1F6FJHtolOgqwkzlsHEny-lCeU0PR073MCiLf5=w640-h186" width="640" /></a></div><p></p><p>As well as <span style="font-family: courier;">init</span> / <span style="font-family: courier;">dealloc</span>:</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEhcaIZYc83200NE96hoUxi3p1iI6ivBQRGvUlTXmf5-K6E4TQwWhx3NCfpkv1sC99KufkpHfLdmWfbOIBXQU3cEUMZQry6L1yGt-WYLp_-SvgwEQNh1kWcv_IDTIBRoIWuoyKIOqO3wPZrfm31KOw_8UV1eW5950uJnisX-bnGKvHXa1WEmYN40npFdrRZJ" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="692" data-original-width="1272" height="348" src="https://blogger.googleusercontent.com/img/a/AVvXsEhcaIZYc83200NE96hoUxi3p1iI6ivBQRGvUlTXmf5-K6E4TQwWhx3NCfpkv1sC99KufkpHfLdmWfbOIBXQU3cEUMZQry6L1yGt-WYLp_-SvgwEQNh1kWcv_IDTIBRoIWuoyKIOqO3wPZrfm31KOw_8UV1eW5950uJnisX-bnGKvHXa1WEmYN40npFdrRZJ=w640-h348" width="640" /></a></div><p></p><p>Unfortunately, this spewed out a bunch of logging, but the only thing it told me was the object that was being erroneously released wasn't one of my own objects. It must be some other object (<span style="font-family: courier;">NSString</span>, <span style="font-family: courier;">NSArray</span>, etc.).</p><h3 style="text-align: left;">The Third Thing That Didn't Work</h3><p>Okay, we know the object is being erroneously autoreleased. Why don't we log some useful information every time anyone autoreleases anything? We can add a symbolic breakpoint on <span style="font-family: courier;">-[NSObject autorelease]</span>.</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEiH-g-rDWPF1mH8GTq4FF7MKNnf1CYNbmUmMShi6rzed3oyQKp9LumIEp1gQZj4iGu19OzFz4wj9JjkWXN0zSICYFFOYc0DXvPEMbyk52WjALRBsTVz3NNkBsuhT6R-XqZ6T_Mh7MjSQqxL81fvohg6SSzKhFZdgXM5sYsda6ZOqGQAtgo-SFfUUyHnoyWZ" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="488" data-original-width="932" height="336" src="https://blogger.googleusercontent.com/img/a/AVvXsEiH-g-rDWPF1mH8GTq4FF7MKNnf1CYNbmUmMShi6rzed3oyQKp9LumIEp1gQZj4iGu19OzFz4wj9JjkWXN0zSICYFFOYc0DXvPEMbyk52WjALRBsTVz3NNkBsuhT6R-XqZ6T_Mh7MjSQqxL81fvohg6SSzKhFZdgXM5sYsda6ZOqGQAtgo-SFfUUyHnoyWZ=w640-h336" width="640" /></a></div><p></p><p>Here's what it looks like when this breakpoint is hit:</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEj86J3i_yqudwLxKyxgamJpbNdbgRkyw9KtzxWARJthwubmFk1DpzpWK4cGOGp8zxKOb2ESLDOfBdBzZuEqkeL-GgdXv_sqruEKxcqhUlahRpKiCcvkdKyI7n_HLbTT_J3mmAC_gA-Co_xsZd2DCFOKeTA-tEYPUBKTpZOL9rV_TZFzfnAIDmRmaWWmUzmS" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="540" data-original-width="1970" height="176" src="https://blogger.googleusercontent.com/img/a/AVvXsEj86J3i_yqudwLxKyxgamJpbNdbgRkyw9KtzxWARJthwubmFk1DpzpWK4cGOGp8zxKOb2ESLDOfBdBzZuEqkeL-GgdXv_sqruEKxcqhUlahRpKiCcvkdKyI7n_HLbTT_J3mmAC_gA-Co_xsZd2DCFOKeTA-tEYPUBKTpZOL9rV_TZFzfnAIDmRmaWWmUzmS=w640-h176" width="640" /></a></div><p></p><p>Interesting - so it looks like all calls to <span style="font-family: courier;">-[NSObject autorelease]</span> are immediately redirected to _objc_rootAutorelease(). The <span style="font-family: courier;">self</span> pointer is preserved as the value of the first argument.</p><p>If you list the registers at the time of the call, you can see the object being released:</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEgRHTrx-1seQZ763KsXVUDnYXlbvFOyXMMccPn52C8kbz4VtSgfaOdS5QjoT7KmhvRLbVxsfoOxrJGjKgHOq5w65zpgZT3txIgarYtOP0hvCT0Jg2JV8qjCfsTQ2BwTQ5SfGUWCSpSS2rWm4-jT0C7vldDv0Ai4O129B1tKzRq4gqP41GVuet1t3kHXfLMN" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="586" data-original-width="594" height="632" src="https://blogger.googleusercontent.com/img/a/AVvXsEgRHTrx-1seQZ763KsXVUDnYXlbvFOyXMMccPn52C8kbz4VtSgfaOdS5QjoT7KmhvRLbVxsfoOxrJGjKgHOq5w65zpgZT3txIgarYtOP0hvCT0Jg2JV8qjCfsTQ2BwTQ5SfGUWCSpSS2rWm4-jT0C7vldDv0Ai4O129B1tKzRq4gqP41GVuet1t3kHXfLMN=w640-h632" width="640" /></a></div><p></p><p>So let's modify the breakpoint to print all the information we're looking for:</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEjzmQwkA5KWpNpBu88j3GOJCsF07q04yVcO21GjyC3wPn0LlUiWdKMVYmpubcOn5q_YIKjKsdTjDfpvXX74ByRpbSfuEWq72r9225sOoCuPgYBPTjCVbarSde3ZWHf21_1gUKI5IWV7Wc8Ou4Al1MsxFz_IuBDbHtXBUYXgKZTky4byeGu-yE5805Captpc" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="868" data-original-width="922" height="602" src="https://blogger.googleusercontent.com/img/a/AVvXsEjzmQwkA5KWpNpBu88j3GOJCsF07q04yVcO21GjyC3wPn0LlUiWdKMVYmpubcOn5q_YIKjKsdTjDfpvXX74ByRpbSfuEWq72r9225sOoCuPgYBPTjCVbarSde3ZWHf21_1gUKI5IWV7Wc8Ou4Al1MsxFz_IuBDbHtXBUYXgKZTky4byeGu-yE5805Captpc=w640-h602" width="640" /></a></div><p></p><p>Unfortunately, this didn't work because it was too slow. Every time lldb evaluates something, it takes a bunch of time, and this was evaluating 3 things every time anybody wanted to autorelease anything, which is essentially all the time. The closed-source application I'm debugging is sensitive enough, that if anything takes too long, the application just quits.</p><h3 style="text-align: left;">The Fourth Thing That Didn't Work</h3><p>Lets try to print out the same information as before, but do it inside the application rather than in lldb. That way, it will be much faster.</p><p>The way we can do this is with something called "function interposing." This uses a <a href="https://opensource.apple.com/source/dyld/dyld-97.1/include/mach-o/dyld-interposing.h.auto.html">feature</a> of dyld which can replace a library's function with your own. Note that this only works if you disable SIP and set the nvram variable <span style="font-family: courier;">amfi_get_out_of_my_way=0x1</span> and reboot.</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEjaE1nVY0EOfb4liBj75KaiptV1i3caYH1JlFyr-v1LBL3u2icig5ZqBAvbiwKPZMzaSMp_bciZQdyZeWX_RMNBFD4UsPYulaUyo2ZpBinYU6CHvPRGf_uhgNdYKV6wNqrKdW8Lb8v_Ni4dRUUMqFTqsuRIo37G8TuLZb_-MT4V_9W0An4J7Gvqq-4pYsBF" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="734" data-original-width="2596" height="180" src="https://blogger.googleusercontent.com/img/a/AVvXsEjaE1nVY0EOfb4liBj75KaiptV1i3caYH1JlFyr-v1LBL3u2icig5ZqBAvbiwKPZMzaSMp_bciZQdyZeWX_RMNBFD4UsPYulaUyo2ZpBinYU6CHvPRGf_uhgNdYKV6wNqrKdW8Lb8v_Ni4dRUUMqFTqsuRIo37G8TuLZb_-MT4V_9W0An4J7Gvqq-4pYsBF=w640-h180" width="640" /></a></div><p></p><p>We can do this to swap out all calls to <span style="font-family: courier;">_objc_rootAutorelease()</span> with our own function.</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEiY7A7lOjboTQvAwQqlKvRKdFiWE9tnBMm8q9DgwPlaMYU-3P7PG17Xu3UZrxBbalXLX6w7XMaFcqh_AShArhbcMTJi_Nh-FxeYXruM5Z1PhEHinF6Jm6C86w8QqntFWp3XggixLT2j_Uc4oLctoWjWzSFblrocSSHNn6P_n7qRCyIMBmN1gpR727QgWemb" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="46" data-original-width="1088" height="28" src="https://blogger.googleusercontent.com/img/a/AVvXsEiY7A7lOjboTQvAwQqlKvRKdFiWE9tnBMm8q9DgwPlaMYU-3P7PG17Xu3UZrxBbalXLX6w7XMaFcqh_AShArhbcMTJi_Nh-FxeYXruM5Z1PhEHinF6Jm6C86w8QqntFWp3XggixLT2j_Uc4oLctoWjWzSFblrocSSHNn6P_n7qRCyIMBmN1gpR727QgWemb=w640-h28" width="640" /></a></div><p></p><p>Inside our own version of <span style="font-family: courier;">_objc_rootAutorelease()</span>, we want to keep track of everything that gets autoreleased. So, let's keep track of a global dictionary, from pointer value to info string.</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEi8VO7_RyXVSlP46YzORpt4Ji9WCs1gcFUZkDOEKEmKZpCpLio7DR57-NX-RJ2vGe2YwLVFnPzLcMFiXKyTr-fkWYpWZfXOD8YMuOkjj8RYhxmYpxJBrTCg4eTVsMzYlvN8JHeY7Lo856sdxA35-A_li6eWo9XG4Cl_ZFygEKREmrwBYMH-atqW8TyM_684" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="46" data-original-width="932" height="32" src="https://blogger.googleusercontent.com/img/a/AVvXsEi8VO7_RyXVSlP46YzORpt4Ji9WCs1gcFUZkDOEKEmKZpCpLio7DR57-NX-RJ2vGe2YwLVFnPzLcMFiXKyTr-fkWYpWZfXOD8YMuOkjj8RYhxmYpxJBrTCg4eTVsMzYlvN8JHeY7Lo856sdxA35-A_li6eWo9XG4Cl_ZFygEKREmrwBYMH-atqW8TyM_684=w640-h32" width="640" /></a></div><p></p><p>We can initialize this dictionary inside a "constructor," which is a special function in a dylib which gets run when the dylib gets loaded by dyld. This is a great way to initialize a global.</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEhojUQIN4e0Lms0tXzo3Omy1UaqorIsIN4y8rRqRa91VF3FG-vrPTriNMW5pdRdOHmiwkOyOMJpznojscNMogcjZBK_XbiobUZM1QplO10_Qn6XBxTpHInSetJ5iQj8WiQeAMhhQbZPQ7fJ1u1MyikzZi4zgfCOSq994Tgxh3FQ7IdXIozVQMj6se7-_v3y" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="198" data-original-width="1296" height="98" src="https://blogger.googleusercontent.com/img/a/AVvXsEhojUQIN4e0Lms0tXzo3Omy1UaqorIsIN4y8rRqRa91VF3FG-vrPTriNMW5pdRdOHmiwkOyOMJpznojscNMogcjZBK_XbiobUZM1QplO10_Qn6XBxTpHInSetJ5iQj8WiQeAMhhQbZPQ7fJ1u1MyikzZi4zgfCOSq994Tgxh3FQ7IdXIozVQMj6se7-_v3y=w640-h98" width="640" /></a></div><p></p><p>Inside <span style="font-family: courier;">my_objc_rootAutorelease()</span>, we can just add information to the dictionary. Then, when the crash occurs, we can print the dictionary and find information about the thing that was autoreleased.</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEh-FzMtyrOypmPMt9_zRrtGStDz5KRCqtdPPwQLKxPUXRMM1vJNU7zxI-cCmDQtTkLboHx-oO11xi8YSyMvtgi5dqKFge9aRCmivnIrYTcy-1sFxzjCgLHAr13DCCpsd6bJ_lC3Lmbqk-ylYAn3YUWPDEttwevaqWLM8aghslPuRrPC2reQeXSYR5OMOeho" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="54" data-original-width="264" height="130" src="https://blogger.googleusercontent.com/img/a/AVvXsEh-FzMtyrOypmPMt9_zRrtGStDz5KRCqtdPPwQLKxPUXRMM1vJNU7zxI-cCmDQtTkLboHx-oO11xi8YSyMvtgi5dqKFge9aRCmivnIrYTcy-1sFxzjCgLHAr13DCCpsd6bJ_lC3Lmbqk-ylYAn3YUWPDEttwevaqWLM8aghslPuRrPC2reQeXSYR5OMOeho=w640-h130" width="640" /></a></div><p></p><p>However, something is wrong...</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEibrhCLuHaoY2KtrJfOISsKF7BY0uZrRo6kdQDUMTX1zpO0eb7zaNj-83tatbAXpnI53bpf8YNCRUGgULdSPwAET6erNlVnIa_eE-3LSzlODUh0n6hm-ON6qAIwjr6nkgbILMVifUwH58x-3_Sk_KAmMiwW3jM1wXwp7PjxgajyUWCfu2-mn0MS5b4tgf-_" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="396" data-original-width="1452" height="174" src="https://blogger.googleusercontent.com/img/a/AVvXsEibrhCLuHaoY2KtrJfOISsKF7BY0uZrRo6kdQDUMTX1zpO0eb7zaNj-83tatbAXpnI53bpf8YNCRUGgULdSPwAET6erNlVnIa_eE-3LSzlODUh0n6hm-ON6qAIwjr6nkgbILMVifUwH58x-3_Sk_KAmMiwW3jM1wXwp7PjxgajyUWCfu2-mn0MS5b4tgf-_=w640-h174" width="640" /></a></div><p></p><p>The dictionary only holds 315 items. That can't possibly be right - it's inconceivable that only 315 things got autoreleased.</p><h3 style="text-align: left;">The Fifth Thing That Didn't Work</h3><p>We're close - we just need to figure out why so few things got autoreleased. Let's verify our assumptions, that <span style="font-family: courier;">[foo autorelease]</span> actually calls <span style="font-family: courier;">_objc_rootAutorelease()</span> by writing such code and looking at its disassembly.</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEi6GIb2jv9tLu6NKiLRTc5N8spGg1J2BxX1UJyEtDzHhKlVkUnQWPYUJ5CHKzKmAH_ylJzpdDdBq6iCHbmgDqF8jUSub-bRfMJdnFLDJdA-pdGRrWO3PyA4IfRbePz9mHe-cgGWn8RgviEl5QIbiqWKFA1pUFlOuUHjEGX9lEin_Fs9oHnoVK1fdUeFxodj" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="190" data-original-width="964" height="126" src="https://blogger.googleusercontent.com/img/a/AVvXsEi6GIb2jv9tLu6NKiLRTc5N8spGg1J2BxX1UJyEtDzHhKlVkUnQWPYUJ5CHKzKmAH_ylJzpdDdBq6iCHbmgDqF8jUSub-bRfMJdnFLDJdA-pdGRrWO3PyA4IfRbePz9mHe-cgGWn8RgviEl5QIbiqWKFA1pUFlOuUHjEGX9lEin_Fs9oHnoVK1fdUeFxodj=w640-h126" width="640" /></a></div><p></p><p>And if you look at the disassembly...</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEhFUDwNuRL_BpEnBLd6XF40mKvRijvCAK1enqQBLsDUH5uc3DFMErKqraYPH-xvGqcZfA1Ebgwasj02Od7_1lLDIwsv32uWEBiNGs0lr2SmIZJZREl2B886vp5Qh-3tSL3tae-ZD-ant12LpVjxG5VaRgOlh7u-nmP3c_-SFZ9dfKJRhwq20eSOeu6jbjdw" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="308" data-original-width="1686" height="116" src="https://blogger.googleusercontent.com/img/a/AVvXsEhFUDwNuRL_BpEnBLd6XF40mKvRijvCAK1enqQBLsDUH5uc3DFMErKqraYPH-xvGqcZfA1Ebgwasj02Od7_1lLDIwsv32uWEBiNGs0lr2SmIZJZREl2B886vp5Qh-3tSL3tae-ZD-ant12LpVjxG5VaRgOlh7u-nmP3c_-SFZ9dfKJRhwq20eSOeu6jbjdw=w640-h116" width="640" /></a></div><p></p><p>You can see 2 really interesting things: the call to <span style="font-family: courier;">alloc</span> and <span style="font-family: courier;">init</span> got compressed to a single C call to <span style="font-family: courier;">objc_alloc_init()</span>, and the call to <span style="font-family: courier;">autorelease</span> got compressed to a single C call to <span style="font-family: courier;">obc_autorelease()</span>. I suppose the Objective-C compiler knows about the <span style="font-family: courier;">autorelease</span> message, and is smart enough to not invoke the entire <span style="font-family: courier;">objc_msgSend()</span> infrastructure for it, but instead just emits a raw C call for it. So that means we've interposed the wrong function - we were interposing <span style="font-family: courier;">_objc_rootAutorelease()</span> when we should have been interposing <span style="font-family: courier;">objc_autorelease()</span>. So let's interpose both:</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEiQ7l7LuAZL8Nv6UHXvzYXHrVXE46qXFHVpbl-MXRUw2HEA_9WETlEWZww7TpTxQKeYOpX5XEFpUkzHkgaDQ1bvYQHctOfla0f9MIUziXmHaOsDl9oKRUfPkV78fhfLIYJu61IH0XMWuLA9G57z5A3oFbQwXUctjiXFp_SwGX925VhG4HbGsQEuohWwFdPy" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="82" data-original-width="1088" height="48" src="https://blogger.googleusercontent.com/img/a/AVvXsEiQ7l7LuAZL8Nv6UHXvzYXHrVXE46qXFHVpbl-MXRUw2HEA_9WETlEWZww7TpTxQKeYOpX5XEFpUkzHkgaDQ1bvYQHctOfla0f9MIUziXmHaOsDl9oKRUfPkV78fhfLIYJu61IH0XMWuLA9G57z5A3oFbQwXUctjiXFp_SwGX925VhG4HbGsQEuohWwFdPy=w640-h48" width="640" /></a></div><p></p><p>This, of course, almost worked - we just have to be super sure that <span style="font-family: courier;">my_objc_autorelease()</span> doesn't accidentally call <span style="font-family: courier;">autorelease</span> on any object - that would cause infinite recursion.</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEhP26OtHj8vF-wPYtgCvmV48liYp77nncM-F_s-H6l7ssB7yI5QsuIu5Px4M4yFsIYeDrZKZ9TzqocNNBuhncTsc_CjI89-rWtv1GpBwjwc3nCr9dqPctLO2Ftjxl_RVffsX2DaZQ4L5oro8xyExbmosYWhuA2nKk-TXXkTKlK4K0VvlV3YA4F798gZrUx3" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="1318" data-original-width="1830" height="460" src="https://blogger.googleusercontent.com/img/a/AVvXsEhP26OtHj8vF-wPYtgCvmV48liYp77nncM-F_s-H6l7ssB7yI5QsuIu5Px4M4yFsIYeDrZKZ9TzqocNNBuhncTsc_CjI89-rWtv1GpBwjwc3nCr9dqPctLO2Ftjxl_RVffsX2DaZQ4L5oro8xyExbmosYWhuA2nKk-TXXkTKlK4K0VvlV3YA4F798gZrUx3=w640-h460" width="640" /></a></div><p></p><h3 style="text-align: left;">The Sixth Thing That Didn't Work</h3><p>Avoiding calling <span style="font-family: courier;">autorelease</span> inside <span style="font-family: courier;">my_objc_autorelease()</span> is actually pretty much impossible, because anything interesting you could log about an object will, almost necessarily, call <span style="font-family: courier;">autorelease</span>. Remember that we're logging information about literally every object which gets autoreleased, which is, in effect, every object in the entire world. Even if you call <span style="font-family: courier;">NSStringFromClass([object class])</span> that will still cause something to be autoreleased.</p><p>So, the solution is to set some global state for the duration of the call to <span style="font-family: courier;">my_objc_autorelease()</span>. If we see a call to <span style="font-family: courier;">my_objc_autorelease()</span> while the state is set, that means we're autoreleasing inside being autoreleased, and we can skip our custom logic and just call the underlying <span style="font-family: courier;">objc_autorelease()</span> directly. However, there's a caveat: this "global" state can't actually be global, because Objective-C objects are created and retained and released on every thread, which means this state has to be thread-local. Therefore, because we're writing in Objective-C and not C++, we must use the pthreads API. The pthreads threadspecific API uses a "key" which has to be set up once, so we can do that in our constructor:<br /></p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEgfAPpMoligBURoBbfINEDsJir5NyQB56rbQUjLXo8TJKxg4OjK4q-wvfIuHDTCB2qV3iz4CFq_iGOT82eHrLDidWyeS9Tk9LoVkBZEbnOjFvZlu2ffWruXzjF24kGutD_aBwHk5lxTyiHQlsYsXumoTd_tnUGE_98UID3QX0s-cTFRoKKztmmhW4hezYbD" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="272" data-original-width="1298" height="134" src="https://blogger.googleusercontent.com/img/a/AVvXsEgfAPpMoligBURoBbfINEDsJir5NyQB56rbQUjLXo8TJKxg4OjK4q-wvfIuHDTCB2qV3iz4CFq_iGOT82eHrLDidWyeS9Tk9LoVkBZEbnOjFvZlu2ffWruXzjF24kGutD_aBwHk5lxTyiHQlsYsXumoTd_tnUGE_98UID3QX0s-cTFRoKKztmmhW4hezYbD=w640-h134" width="640" /></a></div><p></p><p>Then we can use <span style="font-family: courier;">pthread_setspecific()</span> and <span style="font-family: courier;">pthread_getspecific()</span> to determine if our calls are being nested.</p><p>Except this still didn't actually work, because <span style="font-family: courier;">abort()</span> is being called...</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEg-C_Ridv61b_Fek2dR1BEhcF1OYhQFzd85FC7BJbFd_8C3LcGPg-cARUyvUygXaZ5tMxr8YhKWwBl9Qe5_qe20lnLgZ_OAct2p_uvT9lT7Y5d9nI2hh5RpZOgxfEE1ymJ5p6okEYwU7fLBllINGpqyrdxpudcgnpyNtRphbpgUo3y2FWHJ9rKwQWt8Dhxj" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="748" data-original-width="828" height="577" src="https://blogger.googleusercontent.com/img/a/AVvXsEg-C_Ridv61b_Fek2dR1BEhcF1OYhQFzd85FC7BJbFd_8C3LcGPg-cARUyvUygXaZ5tMxr8YhKWwBl9Qe5_qe20lnLgZ_OAct2p_uvT9lT7Y5d9nI2hh5RpZOgxfEE1ymJ5p6okEYwU7fLBllINGpqyrdxpudcgnpyNtRphbpgUo3y2FWHJ9rKwQWt8Dhxj=w640-h577" width="640" /></a></div><p></p><h3 style="text-align: left;">The Seventh Thing That Didn't Work</h3><p>Luckily, when <span style="font-family: courier;">abort()</span> is called, Xcode shows us a pending Objective-C exception:</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEgjns12D7Kc86LxjgrU0YBbuG15Ojlqw1RtaQ1HpdYBE0uzDZpG1XMACG6VZQtNwwREjKgivY4RVpN-PJcpjjGOHdMsKWmEpzPFnJDX6MeY1Z-QPuA6soylyWTU7iM4O41DQ8j_nFe33T7i4l1AhTIQDszuanE-6yB6_OQ_jIi_hPJirvk7FLIKlFHGN_mk" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="118" data-original-width="1178" height="64" src="https://blogger.googleusercontent.com/img/a/AVvXsEgjns12D7Kc86LxjgrU0YBbuG15Ojlqw1RtaQ1HpdYBE0uzDZpG1XMACG6VZQtNwwREjKgivY4RVpN-PJcpjjGOHdMsKWmEpzPFnJDX6MeY1Z-QPuA6soylyWTU7iM4O41DQ8j_nFe33T7i4l1AhTIQDszuanE-6yB6_OQ_jIi_hPJirvk7FLIKlFHGN_mk=w640-h64" width="640" /></a></div><p></p><p>Okay, something is being set to nil when it shouldn't be. Let's set an exception breakpoint to see what is being set wrong:</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEgAgfnrzkfs3j8UA9A4VXJnMOQY8QeFkihd_DoyeMWajcUgPlMKw6Ixv69OgSBDY0bYMZJmRUJdKg2cm52ZI0RsSuf9M_489oraTNYG3AxW1IhjQ7IsDst0wCFVwHw3OYYmTXmQeOC3NLdK_dG0NsbFBy4fKRrPT5UkhB5QSThbpTZ0oqHyCWOqWYORiP8u" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="470" data-original-width="1250" height="240" src="https://blogger.googleusercontent.com/img/a/AVvXsEgAgfnrzkfs3j8UA9A4VXJnMOQY8QeFkihd_DoyeMWajcUgPlMKw6Ixv69OgSBDY0bYMZJmRUJdKg2cm52ZI0RsSuf9M_489oraTNYG3AxW1IhjQ7IsDst0wCFVwHw3OYYmTXmQeOC3NLdK_dG0NsbFBy4fKRrPT5UkhB5QSThbpTZ0oqHyCWOqWYORiP8u=w640-h240" width="640" /></a></div><p></p><p>Welp. It turns out <span style="font-family: courier;">NSStringFromClass([object class])</span> can sometimes return nil...</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEj-k1dltjXY5AK_tAPiX8df0tsOIkoTpQgE1_yjU6p7GAX3_Vxbz0Q_4Nq6IGkPb3DjopvPnlI2sdvsftd7h7lEAlzfbOXIPKgGsVKr3401efvPN0-DlkQWc8Df97Y2otxb9BAWmLjLFeXJzq-3XPYzIVGIFomB_7hDbI_-QZwfTSwkT68QTKL03OK-0mKg" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="536" data-original-width="2672" height="128" src="https://blogger.googleusercontent.com/img/a/AVvXsEj-k1dltjXY5AK_tAPiX8df0tsOIkoTpQgE1_yjU6p7GAX3_Vxbz0Q_4Nq6IGkPb3DjopvPnlI2sdvsftd7h7lEAlzfbOXIPKgGsVKr3401efvPN0-DlkQWc8Df97Y2otxb9BAWmLjLFeXJzq-3XPYzIVGIFomB_7hDbI_-QZwfTSwkT68QTKL03OK-0mKg=w640-h128" width="640" /></a></div><p></p><h3 style="text-align: left;">The Eighth Thing That Worked</h3><p>Okay, let's fix that by checking for nil and using <span style="font-family: courier;">[NSNull null]</span>. Now, the program actually crashes in the right place!</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEhXuW5yQ4bzbOptp-su4bZVqs5QBR63UnS6sNq3yteubVh2BTA9NXtyJexXCPvghGdNBki9aEQU-8gqRT01RUeV4v5i6aTGdfxRLUaYLluUyc2WlKmdJl_7r2mAhgy3Xk_iRpQ5Aonk03ENkSeGwvLNMwVpaIQmdUMd6U6EWDIQE0A_4uyDdn1Deek350Ft" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="82" data-original-width="324" height="162" src="https://blogger.googleusercontent.com/img/a/AVvXsEhXuW5yQ4bzbOptp-su4bZVqs5QBR63UnS6sNq3yteubVh2BTA9NXtyJexXCPvghGdNBki9aEQU-8gqRT01RUeV4v5i6aTGdfxRLUaYLluUyc2WlKmdJl_7r2mAhgy3Xk_iRpQ5Aonk03ENkSeGwvLNMwVpaIQmdUMd6U6EWDIQE0A_4uyDdn1Deek350Ft=w640-h162" width="640" /></a></div><p></p><p>That's more like it. Let's see what the pointer we're looking for is...</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEji82vSEIOSwWid1IVXWHz_15P0lUp4tHXic48AAw92gnL_gLZgpbresnNADuKr64DyEy9MKGOIohdurgd_bqCQj8VPBxvf0P7v8Zm9Y5T03guQOuGweUov9NZMDZL_HwNeUrmgWPXpYPx24xr28oK7zQ0caj1_xl5sbxITshXF9-1rMkoHHTFRUWTmxPyr" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="136" data-original-width="324" height="134" src="https://blogger.googleusercontent.com/img/a/AVvXsEji82vSEIOSwWid1IVXWHz_15P0lUp4tHXic48AAw92gnL_gLZgpbresnNADuKr64DyEy9MKGOIohdurgd_bqCQj8VPBxvf0P7v8Zm9Y5T03guQOuGweUov9NZMDZL_HwNeUrmgWPXpYPx24xr28oK7zQ0caj1_xl5sbxITshXF9-1rMkoHHTFRUWTmxPyr" width="320" /></a></div><p></p><p>Okay, let's look for it in bigDict!</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEjyOUeKAHGM69J96eUrfv6gRGyt5_paiYpLyVaTPIUod73weOYVKDTRaYNzjfs7Yr5m7zctwlXQkNCFRULamGTRg-h8yniXHBbgqvnkO17KF0RniYJkcCqAO9U4FOI2Cv7XeqIAeYlOH0sxBApE61zUfjEPtwws6g2WL4dHtqCIbvdyLuFJYBzMg-YVmjnw" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="452" data-original-width="1552" height="186" src="https://blogger.googleusercontent.com/img/a/AVvXsEjyOUeKAHGM69J96eUrfv6gRGyt5_paiYpLyVaTPIUod73weOYVKDTRaYNzjfs7Yr5m7zctwlXQkNCFRULamGTRg-h8yniXHBbgqvnkO17KF0RniYJkcCqAO9U4FOI2Cv7XeqIAeYlOH0sxBApE61zUfjEPtwws6g2WL4dHtqCIbvdyLuFJYBzMg-YVmjnw=w640-h186" width="640" /></a></div><p></p><p>Woohoo! Finally some progress. The object being autoreleased is an <span style="font-family: courier;">NSDictionary</span>.</p><p>But that's not enough, though. What we really want is a backtrace. We can't use lldb's backtrace because it's too slow, but luckily macOS has a <span style="font-family: courier;">backtrace()</span> <a href="https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man3/backtrace.3.html">function</a> which gives us backtrace information! Let's build a string out of the backtrace information:</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEgPOwjz-IjNczdMO53JH4jz0Fva7GwNlWevVK8pmnI26Q11VCW728Uj6MYW3gukkP2tTqJOdrIhOF5kk53JIHKINWz46o2dtyHZ46owWR8CLc8VB4DmCohoQXsMR-ZTZkFxHTkF17GxOiODQgOPRypmhgTOOXa6ifcPIjgiCglFh4SbYasR8Lax-pxXxVeL" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="460" data-original-width="1312" height="224" src="https://blogger.googleusercontent.com/img/a/AVvXsEgPOwjz-IjNczdMO53JH4jz0Fva7GwNlWevVK8pmnI26Q11VCW728Uj6MYW3gukkP2tTqJOdrIhOF5kk53JIHKINWz46o2dtyHZ46owWR8CLc8VB4DmCohoQXsMR-ZTZkFxHTkF17GxOiODQgOPRypmhgTOOXa6ifcPIjgiCglFh4SbYasR8Lax-pxXxVeL=w640-h224" width="640" /></a></div><p></p><p>Welp, that's too slow - the program exits. Let's try again by setting <span style="font-family: courier;">frameCount</span> to 6:</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEg0hd7D-S-q9aUs33jmk-RqXpgSScDGy1jYl9wz_P6aEtoW5UTKkk22DFPUPrmqTxW2UBCa0QMh23nuAqY_w9Bd4m6WbAmVC4w5krrMYhDcOdUJqf6v7TpvGHQVvHo-FDipEec6CRI7F0rx9kO1OVxTip4e-FsBDO92Um-6P4pp2La5SVXhOMSu1paVhJ2P" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="462" data-original-width="1314" height="226" src="https://blogger.googleusercontent.com/img/a/AVvXsEg0hd7D-S-q9aUs33jmk-RqXpgSScDGy1jYl9wz_P6aEtoW5UTKkk22DFPUPrmqTxW2UBCa0QMh23nuAqY_w9Bd4m6WbAmVC4w5krrMYhDcOdUJqf6v7TpvGHQVvHo-FDipEec6CRI7F0rx9kO1OVxTip4e-FsBDO92Um-6P4pp2La5SVXhOMSu1paVhJ2P=w640-h226" width="640" /></a></div><p></p><p>So here is the final <span style="font-family: courier;">autorelease</span> function:</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEgqOAzfcSE7ux1nTBxLRF4s0x5AkP1O1_sKZbeQJbXFJu5pD-n0X3UPYsW3-qzdg1GCQIYaKQJb9hdi4o-ShTFtKdRg1umk7rXVtgp_PWgdqroDCRBq7JP4Lq3FNuXII8G04yuASSzwNSWMEe8NDet5uMEGs8KKUFzB6vtzGBx2rbUXpY0nwY4g9mMPp3PG" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="726" data-original-width="1488" height="312" src="https://blogger.googleusercontent.com/img/a/AVvXsEgqOAzfcSE7ux1nTBxLRF4s0x5AkP1O1_sKZbeQJbXFJu5pD-n0X3UPYsW3-qzdg1GCQIYaKQJb9hdi4o-ShTFtKdRg1umk7rXVtgp_PWgdqroDCRBq7JP4Lq3FNuXII8G04yuASSzwNSWMEe8NDet5uMEGs8KKUFzB6vtzGBx2rbUXpY0nwY4g9mMPp3PG=w640-h312" width="640" /></a></div><p></p><p>Okay, now let's run it, and print out the object we're interested in:</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEiEnpJB9asbNpkhiEbd8SEtKCKj3PPZWp_zVAUU83WLtdZTUGnYdieAmh83y2kQA6VoVjVRpU6V5E9fx2pFFBN2h7PP8mL5BrUd9M35e-WeUYBXfULgKMo0gqAKgSViMZWTPHMrdL1OOZq_WVWZ2hOyM8MgprjsN9pwpAytRodbaKyHxNXm_yCqFd3Tfdab" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="84" data-original-width="326" height="164" src="https://blogger.googleusercontent.com/img/a/AVvXsEiEnpJB9asbNpkhiEbd8SEtKCKj3PPZWp_zVAUU83WLtdZTUGnYdieAmh83y2kQA6VoVjVRpU6V5E9fx2pFFBN2h7PP8mL5BrUd9M35e-WeUYBXfULgKMo0gqAKgSViMZWTPHMrdL1OOZq_WVWZ2hOyM8MgprjsN9pwpAytRodbaKyHxNXm_yCqFd3Tfdab=w640-h164" width="640" /></a></div><p></p><p>And the bigDict:</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEhtqBPVvPi6NdPCEhGyXQKXzF6Whf24KuYtUq-bbigDsYfuvrnQNsjSIrHnaMJET02-wB80EIdts8Fb95BxcUkVomBmr7LJsCVXSM8p3gTUd_aOlec5juLkctkGXufGEkcrbt4MOGpTM7k19DfeBCt_HLURjP8-YWNEEQMZzxEhlOdb4WV7NYKPYzoyUIUl" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="400" data-original-width="1550" height="166" src="https://blogger.googleusercontent.com/img/a/AVvXsEhtqBPVvPi6NdPCEhGyXQKXzF6Whf24KuYtUq-bbigDsYfuvrnQNsjSIrHnaMJET02-wB80EIdts8Fb95BxcUkVomBmr7LJsCVXSM8p3gTUd_aOlec5juLkctkGXufGEkcrbt4MOGpTM7k19DfeBCt_HLURjP8-YWNEEQMZzxEhlOdb4WV7NYKPYzoyUIUl=w640-h166" width="640" /></a></div><p></p><p>Woohoo! It's a great success! Here's the stack trace, formatted:</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEitFwEEcfSNM6_ub1PYuLwG5yPVfhFhVIrgehOV6kzoCThmeAr4oOZ7v8uRssCNqk8kAt512Kv3rMiFJG-QYqGB_BilPJtXQI3NCt4CM2Ah76xHYrkIhnWOSqO1CZMRmDs1ONalYmtaPgPwfrbae2TPHRnh-txy4C2BExxtmP9IcsosiBKaUy-5t_4KZ6P5" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="138" data-original-width="1144" height="78" src="https://blogger.googleusercontent.com/img/a/AVvXsEitFwEEcfSNM6_ub1PYuLwG5yPVfhFhVIrgehOV6kzoCThmeAr4oOZ7v8uRssCNqk8kAt512Kv3rMiFJG-QYqGB_BilPJtXQI3NCt4CM2Ah76xHYrkIhnWOSqO1CZMRmDs1ONalYmtaPgPwfrbae2TPHRnh-txy4C2BExxtmP9IcsosiBKaUy-5t_4KZ6P5=w640-h78" width="640" /></a></div><p></p><p>Excellent! This was enough for me to find the place where I had over-released the object.</p>Litherumhttp://www.blogger.com/profile/12738405376090442005noreply@blogger.com0tag:blogger.com,1999:blog-8778351438463999796.post-87895121775154691122023-11-28T00:32:00.000-08:002023-11-28T00:37:37.310-08:00Nvidia SLI from Vulkan's Point of View<p>SLI is an Nvidia technology, which (is supposed to) allow multiple GPUs to act as one. The use case is supposed to be simple: you turn it on, and everything gets faster. However, that's not how it works in Vulkan (because of course it isn't - nothing is simple in Vulkan). So let's dig in and see exactly how it works and what's exposed in Vulkan.</p><h3 style="text-align: left;">Logical Device Creation</h3><p>SLI is exposed in Vulkan with 2 extensions, both of which have been promoted to core in Vulkan 1.1: <span style="font-family: courier;">VK_KHR_device_group_creation</span>, and <span style="font-family: courier;">VK_KHR_device_group</span>. The reason there are 2 is esoteric: one is an "instance extension" and the other is a "device extension." Because enumerating device groups has to happen before you actually create a logical device, those enumeration functions can't be part of a device extension, so they're part of the instance extension instead. The instance extension is really small - it essentially just lets you list device groups, and for each group, list the physical devices inside it. When you create your logical device, you just list which physical devices should be part of the new logical device.</p><p>Now that you've created your logical device, there are a few different pieces to how this stuff works.</p><h3 style="text-align: left;">Beginning of the Frame</h3><p>At the beginning of your frame, you would normally call <span style="font-family: courier;">vkAcquireNextImageKHR()</span>, which schedules a semaphore to be signaled when the next swapchain image is "acquired" (which means "able to be rendered to"). (The rest of your rendering is supposed to wait on this semaphore to be signaled.) <span style="font-family: courier;">VK_KHR_device_group</span> replaces this function with <span style="font-family: courier;">vkAcquireNextImage2KHR()</span>, which adds a single parameter: a "device mask" of which physical devices in the logical device should be ready before the semaphore is signaled.</p><p>It took me a while to figure this out, but each physical device gets its own distinct contents of the swapchain image. When you write your Vulkan program, and you bind a swapchain image to a framebuffer, that actually binds <span style="font-family: courier;">n</span> different contents - one on each physical device. When a physical device executes and interacts with the image, it sees its own independent contents of the image.</p><h3 style="text-align: left;">End of the Frame</h3><p>At the end of the frame, you'll want to present, and this is where things get a little complicated. Each physical device in the logical device may or may not have a "presentation engine" in it. Also, recall that each physical device has its own distinct contents of the swapchain image.</p><p>There are 4 different presentation "modes" (<span style="font-family: courier;">VkDeviceGroupPresentModeFlagBitsKHR</span>). Your logical device will support some subset of these modes. The 4 modes are:</p><p></p><ol style="text-align: left;"><li>Local presentation: Any physical device with a presentation engine can present, but it can only present the contents of its own image. When you present, you tell Vulkan which physical device and image to present (<span style="font-family: courier;">VkDeviceGroupPresentInfoKHR</span>).</li><li>Remote presentation: Any physical device with a presentation engine can present, and it can present contents from other physical devices. Vulkan exposes a graph (<span style="font-family: courier;">vkGetDeviceGroupPresentCapabilities()</span>) that describes which physical devices can present from which other physical devices in the group. When you present, you tell Vulkan which image to present, and there's a requirement that <i>some</i> physical device with a presentation engine is able to present the image you selected.</li><li>Sum presentation: Any physical device with a presentation engine can present, and it presents the component-wise sum of the contents of the image from multiple physical devices. Again, there's a graph that indicates, for each physical device that has a presentation image, which other physical devices it's able to sum from. When you present, you specify which physical devices' contents to sum, via a device mask (and there's a requirement that there is some physical device with a presentation engine that can sum from all of the requested physical devices).</li><li>Local multi-device presentation: Different physical devices (with presentation engines) can present different disjoint rects of their own images, which get merged together to a final image. You can tell which physical devices present which rects by calling <span style="font-family: courier;">vkGetPhysicalDevicePresentRectanglesKHR()</span>. When you present, you specify a device mask, which tells which physical devices present their rects.</li></ol><p></p><p>On my machine, only the local presentation mode is supported, and both GPUs have presentation engines. That means the call to present gets to pick (<span style="font-family: courier;">VkDeviceGroupPresentInfoKHR</span>) which of the two image contents actually gets presented.</p><h3 style="text-align: left;">Middle of the Frame</h3><p>The commands in the middle of the frame are probably actually the most straightforward. When you begin a command buffer, you can specify a device mask (<span style="font-family: courier;">VkDeviceGroupCommandBufferBeginInfo</span>) of which physical devices will execute the command buffer. Inside the command buffer, when you start a render pass, you can also specify another device mask (<span style="font-family: courier;">VkDeviceGrupRenderPassBeginInfo</span>) for which physical devices will execute the render pass, as well as assigning each physical device its own distinct "render area" rect. Inside the render pass, you can run <span style="font-family: courier;">vkCmdSetDeviceMask()</span> to change the set of currently running physical devices. In your SPIR-V shader, there's even a built-in intrinsic "DeviceIndex" to tell you which GPU in the group you're running on. And then, finally, when you actually submit the command buffer, you can supply (<span style="font-family: courier;">VkDeviceGroupSubmitInfo</span>) a device mask you want to submit the command buffers to.</p><p>There's even a convenience <span style="font-family: courier;">vkCmdDispatchBase()</span> which lets you set "base" values for workgroup IDs, which is convenient if you want to spread one workload across multiple GPUs. Pipelines have to be created with <span style="font-family: courier;">VK_PIPELINE_CREATE_DISPATCH_BASE_KHR</span> to use this, though.</p><h3 style="text-align: left;">Resources</h3><p>It's all well and good to have multiple physical devices executing the same command buffer, but simply <i>execution</i> is not enough: you also need to bind resources to those shaders and commands that get run.</p><p>When allocating a resource, there are 2 ways for it to happen: either each physical device gets its own distinct contents of the allocation, or all the physical devices share a single contents. If the allocation's heap is marked as <span style="font-family: courier;">VK_MEMORY_HEAP_MULTI_INSTANCE_BIT_KHR</span>, then all allocations will be replicated distinctly across each of the physical devices. Even if the heap isn't marked that way, the individual allocation can still be marked that way (<span style="font-family: courier;">VkMemoryAllocateFlagsInfo</span>). On my device, the GPU-local heap is marked as multi-instance.</p><p>Communication can happen between the devices by using Vulkan's existing memory binding infrastructure. Recall that, in Vulkan, you don't just create a resource; instead, you make an allocation, and a resource, and then you separately bind the two together. Well, it's possible to bind a resource on one physical device with an allocation on a different physical device (<span style="font-family: courier;">VkBindBufferMemoryDeviceGroupInfo</span>, <span style="font-family: courier;">VkBindImageMemoryDeviceGroupInfo</span>)! When you make one of these calls, it will execute on all the physical devices, so these structs indicate the graph of which resources on which physical devices get bound to which allocations on which (other) physical resources. For textures, you can even be more fine-grained than this, and bind just a region of a texture across physical devices (assuming you created the image with <span style="font-family: courier;">VK_IMAGE_CREATE_SPLIT_INSTANCE_BIND_REGIONS_BIT_KHR</span>). This also works with sparse resources - when you bind a sparse region of a texture, that sparse region can come from another physical device, too (<span style="font-family: courier;">VkDeviceGroupBindSparseInfo</span>).</p><p>Alas, there are restrictions. <span style="font-family: courier;">vkGetDeviceGroupPeerMemoryFeatures()</span> tells you, once you've created a resource and bound it to an allocation on a different physical device, how you're allowed to use that resource. For each combination of (heap index, local device index, and remove device index), a subset of 4 possible uses will be allowed (<span style="font-family: courier;">VkPeerMemoryFeatureFlagBits</span>):</p><p></p><ol style="text-align: left;"><li>The local device can copy to the remote device</li><li>The local device can copy from the remote device</li><li>The local device can read the resource directly</li><li>The local device can write to the resource directly</li></ol><p></p><p>This is really exciting - if either of the bottom two uses are allowed, it means you can bind one of these cross-physical-device resources to a shader and use it as-if it were any normal resource! Even if neither of the bottom two uses are allowed, just being able to copy between devices without having to round-trip through main memory is already cool. On my device, only the first 3 uses are allowed.</p><h3 style="text-align: left;">Swapchain Resources</h3><p>Being able to bind a new resource to a different physical device's allocation is good, but swapchain images come pre-bound, which means that mechanism won't work for swapchain resources. So there's a new mechanism for that: it's possible to bind a new image to the storage for an existing swapchain image (<span style="font-family: courier;">VkBindImageMemorySwapchainInfoKHR</span>). This can be used in conjunction with <span style="font-family: courier;">VkBindImageMemoryDeviceGroupInfo</span> which I mentioned above, to make the allocations cross physical devices.</p><p>So, if you want to copy from one physical device's swapchain image to another physical device's swapchain image, what you'd do is:</p><p></p><ol style="text-align: left;"><li>Create a new image (of the right size, format, etc.). Specify <span style="font-family: courier;">VkImageSwapchainCreateInfoKHR</span> to indicate its storage will come from the swap chain.</li><li>Bind it (<span style="font-family: courier;">VkBindImageMemoryInfo</span>), but use both...</li><ol><li><span style="font-family: courier;">VkBindImageMemorySwapchainInfoKHR</span> to have its storage come from the swap chain, and</li><li><span style="font-family: courier;">VkBindImageMemoryDeviceGroupInfo</span> to specify that its storage comes from <i>another physical device's swap chain contents</i></li></ol><li>Execute a copy command to copy from one image to the other image.</li></ol><p></p><h3 style="text-align: left;">Conclusion</h3><p>It's a pretty complicated system! Certainly much more complicated than SLI is in Direct3D. It seems like there are 3 core benefits of device groups:</p><p></p><ol style="text-align: left;"><li>You can execute the same command stream on multiple devices without having to re-encode it multiple times or call into Vulkan multiple times for each command. The device masks implicitly duplicate the execution.</li><li>There are a variety of presentation modes, which allow automatic merging of rendering results, without having to explicitly execute a render pass or a compute shader to merge the results. Unfortunately, my cards don't support this.</li><li>Direct physical-device-to-physical-device communication, without round-tripping through main memory. Indeed, for some use cases, you can just bind a remote resource and use it as-if it was local. Very cool!</li></ol><p></p><p>I'm not quite at the point where I can run some benchmarks to see how much SLI improves performance over simply creating two independent Vulkan logical devices. I'm working on a ray tracer, so there are a few different ways of joining the rendering results from the two GPUs. To avoid seams, the denoiser will probably have to run on just one of the GPUs.</p>Litherumhttp://www.blogger.com/profile/12738405376090442005noreply@blogger.com0tag:blogger.com,1999:blog-8778351438463999796.post-19321614113449094712023-10-29T22:23:00.006-07:002023-10-29T22:28:47.649-07:00My First Qt App<p>Just for fun, I wanted to try to make a Qt app that graphs some data. For contrast, I'm aware of <a href="https://developer.apple.com/documentation/Charts">Swift Charts</a>, and I thought using Qt to graph some stuff would be a fun little project, now that I'm using FreeBSD full time, rather than macOS. The latest version of Qt is version 6, so that's what I'll be using.</p><h3 style="text-align: left;">Basics<br /></h3><p>When you use Qt Creator to create a new Qt project, it only creates 4 files:</p><ul style="text-align: left;"><li>CMakeLists.txt</li><li>CMakeLists.txt.user</li><li>main.cpp</li><li>Main.qml</li></ul><p>Qt Creator can understand CMakeLists.txt directly - if you want to open the "project," you open that file. Just like with Cocoa programming, main.cpp doesn't contain much inside it - it's just a few lines line and it initializes the existing infrastructure to load the app's UI.</p><p>Also, like Cocoa programming, most of the description of the UI of the app is described declaratively, in the .qml file. The way this works is you say something like:</p><div style="text-align: left;"><span style="font-family: courier;">Foo {</span></div><div style="text-align: left;"><span style="font-family: courier;"> bar: baz <br /></span></div><div style="text-align: left;"><span style="font-family: courier;">}</span></div><p>And this means "when the QML file is loaded, create an instance of type Foo, and set the property named bar on this new object to a value of baz."</p><p>The outermost level is this:</p><p><span style="font-family: courier;">Window {<br /> width: 640<br /> height: 480<br /> visible: true<br /> title: qsTr("Hello World")<br />}</span></p><p>Then, you can add "elements" inside the window, by placing it inside the <span style="font-family: courier;">{}</span>s. There are collection views (<span style="font-family: courier;">Row</span>, <span style="font-family: courier;">Column</span>, <span style="font-family: courier;">Grid</span>, <span style="font-family: courier;">Flow</span>) which define how to lay out their children, and there are also more general elements like <span style="font-family: courier;">Rectangle</span>. When your layout is not naturally specified (because you're not using containers or whatever), you describe the layout using anchors, like <span style="font-family: courier;">anchors.centerIn: parent</span> or <span style="font-family: courier;">anchors.fill: parent</span>.</p><h3 style="text-align: left;">Qt Charts</h3><p>Qt has a <a href="https://doc.qt.io/qt-6/qtcharts-overview.html">built-in chart element</a>, so the first thing I did was just copy the ChartView example directly into my QML document as a child of the Window. However, that didn't work, and some searching found <a href="https://doc.qt.io/qt-6/qtcharts-index.html">this note</a>:</p><p>> Note: An instance of QApplication is required for the QML types as the module depends on Qt's Graphics View Framework for rendering. QGuiApplication is not sufficient. However, projects created with Qt Creator's Qt Quick Application wizard are based on the Qt Quick template that uses QGuiApplication by default. All the QGuiApplication instances in such projects must be replaced with QApplication.</p><p>Okay, so I replaced <span style="font-family: courier;">QGuiApplication</span> with <span style="font-family: courier;">QApplication</span> in main.cpp, and changed <span style="font-family: courier;">#include <QGuiApplication></span> to <span style="font-family: courier;">#include <QApplication></span>, only to find that there is now a compile error: the compiler can't find that file. After some more searching, it turns out I needed to change this:</p><p><span style="font-family: courier;">find_package(Qt6 6.5 REQUIRED COMPONENTS Quick</span> </p><p>to</p><p><span style="font-family: courier;">find_package(Qt6 6.5 REQUIRED COMPONENTS Quick Widgets)</span> </p><p>and change </p><p><span style="font-family: courier;">target_link_libraries(appGrapher<br /> PRIVATE Qt6::Quick<br />)</span></p><p>to</p><p><span style="font-family: courier;">target_link_libraries(appGrapher<br /> PRIVATE Qt6::Quick<br /> PRIVATE Qt6::Widgets<br />)</span></p><p>Huh. After doing that, it worked no problem.</p><h3 style="text-align: left;">Data Source (C++ interop)<br /></h3><p>So now I have a chart, which is pretty cool, but the data that the chart uses is spelled out literally in the QML file. That's not very useful - I plan on generating thousands of data points, and I don't want to have to put them inline inside this QML thing. Instead, I want to load them from an external source.</p><p>QML files allow you to run JavaScript by literally placing bits of JavaScript inside the QML file, but I think I want to do better - I want my data source to come from C++ code, so I have full freedom about how I generate it. From some searching, it looks like there are 2 ways of having C++ and QML JavaScript interoperate:</p><ul style="text-align: left;"><li>You can <a href="https://doc.qt.io/qt-6/qqmlengine.html#qmlRegisterSingletonInstance">register</a> a singleton, or a singleton instance, and then the JavaScript can call methods on that singleton</li><li>You can <a href="https://doc.qt.io/qt-6/qqmlengine.html#qmlRegisterType">register</a> a type, and have the QML create an instance of that type, just like any other element</li><li>(You can <a href="https://doc.qt.io/qt-6/qqmlcontext.html#setContextProperty">setContextProperty()</a>, which lets the QML look up an instance that you set ahead of time. However, there's a note that says "You should not use context properties to inject values into your QML components" which is exactly what I'm trying to do, so this probably isn't the right solution.)</li></ul><p>I have a general aversion to singletons, and I think registering a type is actually what I want, because I want the QML infrastructure to own the instance and define its lifetime, so that's the approach I went with. The way you do this is, in <span style="font-family: courier;">main()</span> after you create the <span style="font-family: courier;">QApplication</span> but before you do anything else, you call <span style="font-family: courier;">qmlRegisterType()</span>. Here is what <span style="font-family: courier;">main()</span> says:</p><p><span style="font-family: courier;">qmlRegisterType<DataSource>("com.litherum", 1, 0, "DataSource");</span></p><p>This allows the QML to say <span style="font-family: courier;">import com.litherum</span>, which is pretty cool.</p><h3 style="text-align: left;">QObject<br /></h3><p>Defining the <span style="font-family: courier;">DataSource</span> type in C++ is a bit weird. It turns out that Qt objects are not just regular C++ objects. Instead, you write your classes in a different language, which is similar to C++, and then there is a "meta-object compiler" which will compile your source to actual C++. It <a href="https://doc.qt.io/qt-6/metaobjects.html">looks like</a> the main purpose of this is to be able to connect signals and slots, where an object can emit a signal, and if a slot in some other object is connected to that signal, then the slot callback gets run in that other object. It seems pretty similar to observers in Objective-C. They also have the ability to perform introspection, like Objective-C .... I kind of don't understand why they didn't just invent a real language rather than doing this C++ transpilation silliness.</p><p>Anway, you can define your (not-)C++ class, inherit from <span style="font-family: courier;">QObject</span>, annotate the class with <span style="font-family: courier;">Q_OBJECT</span> and <span style="font-family: courier;">QML_ELEMENT</span>, and give it a method with the <span style="font-family: courier;">Q_INVOKABLE</span> annotation. Sure, fine. Then, in the QML file, you can add a stanza which tells the system to create an instance of this class, and you can use the <span style="font-family: courier;">Component.onCompleted</span> JavaScript handler to call into it (via its id). Now you can call the C++ method you just defined from within the QML. Cool. This is what the C++ header says:</p><p><span style="font-family: courier;">class DataSource : public QObject<br />{<br /> Q_OBJECT<br /> QML_ELEMENT<br />public:<br /> explicit DataSource(QObject *parent = nullptr);<br /><br /> Q_INVOKABLE void updateData(QXYSeries*, double time);<br />};</span> <br /></p><p>Okay, the method is supposed to set the value of the <span style="font-family: courier;">SplineSeries</span> in the chart. The most natural way to do this is to pass the <span style="font-family: courier;">SplineSeries</span> into the C++ function as a parameter. This is actually pretty natural - all the QML types have corresponding C++ types, so you just make the C++ function accept a <span style="font-family: courier;">QSplineSeries*</span>. Except we run into the same compiler error where the compiler can't find <span style="font-family: courier;">#include <QSplineSeries></span>. It turns out that in CMakeLists.txt we have to make a similar addition and add <span style="font-family: courier;">Charts</span> to both places that we added <span style="font-family: courier;">Widgets</span> above. Fine. Here's what the QML says:<br /></p><p> <span style="font-family: courier;">DataSource {<br /> id: dataSource<br /> Component.onCompleted: function() {<br /> dataSource.updateData(splineSeries, Date.now());<br /> }<br />} </span></p><p>Once you do this, it actually works out well - the C++ code can call methods on the <span style="font-family: courier;">QSplineSeries</span>, and it can see the values that have been set in the QML. It can generate a <span style="font-family: courier;">QList<QPointF></span> and call <span style="font-family: courier;">QSplineSeries::replace()</span> with the new list.</p><p>The one thing I couldn't get it to do was automatically rescale the charts' axes when I swap in new data with different bounds. Oh well.<br /></p><p>I did want to go one step further, though!</p><h3 style="text-align: left;">Animation</h3><p>One of the coolest things about retained-mode UI toolkits is that they often allow for animations for free. Swapping out the data in the series should allow Qt to smoothly animate from the first data set to the second. And it actually totally worked! It took me a while to figure out how specifically to spell the values, but in the QML file, you can set these on the <span style="font-family: courier;">ChartView</span>:</p><div style="text-align: left;"><span style="font-family: courier;">animationOptions: ChartView.AllAnimations</span></div><div style="text-align: left;"><span style="font-family: courier;">animationDuration: 300 // milliseconds<br /></span></div><div style="text-align: left;"><span style="font-family: courier;">animationEasingCurve {</span></div><div style="text-align: left;"><span style="font-family: courier;"> type: Easing.InOutQuad</span></div><div style="text-align: left;"><span style="font-family: courier;">}</span></div><p>I found these by looking at the <a href="https://doc.qt.io/qt-6/qchart.html">documentation</a> for QChart. And, lo and behold, changing the data values smoothly animated the spline on the graph! I also needed some kind of timer to actually call my C++ function to generate new data, which you do with QML also:</p><p><span style="font-family: courier;">Timer {<br /> interval: 1000 // milliseconds<br /> running: true<br /> repeat: true<br /> onTriggered: function() {<br /> dataSource.updateData(splineSeries, Date.now());<br /> }<br />} </span><br /></p><p>Super cool stuff! I'm always impressed when you can enable animations in a declarative way, without having your own code running at 60fps. Also, while the animations are running, from watching KSysGuard, it looks like the rendering is multithreaded, which is super cool too! (And, I realized that KSysGuard probably uses Qt Charts under the hood too, to show its performance graphs.)<br /></p><h3 style="text-align: left;">Conclusion</h3><p>It looks like Qt Charts is pretty powerful, has lots of options to make it beautiful, and is somewhat fairly performant (though I didn't rigorously test the performance). Using it did require creating a whole Qt application, but the application is super small, only has a few files, and each file is pretty small and understandable. And, being able to make arbitrary dynamic updates over time while getting animation for free was pretty awesome. I think being able to describe most of the UI declaratively, rather than having to describe it all 100% in code, is definitely a good design decision for Qt. And the C++ interop story was a little convoluted (having to touch <span style="font-family: courier;">main()</span> is a bit unfortunate) but honestly not too bad in the end.<br /></p>Litherumhttp://www.blogger.com/profile/12738405376090442005noreply@blogger.com0tag:blogger.com,1999:blog-8778351438463999796.post-38746261902252307272023-10-28T18:26:00.001-07:002023-10-28T18:26:37.926-07:00ReSTIR Part 2: Characterizing Sample Reuse<p>After enumerating all the <a href="https://litherum.blogspot.com/2023/10/restir-part-1-building-blocks.html">building blocks of ReSTIR</a>, there isn't actually that much more. The <a href="https://en.wikipedia.org/wiki/Rendering_equation">rendering equation</a> is an integral, and our job is to approximate the value of the integral by sampling it in the most intelligent way possible. <br /></p><p>Importance sampling tells us that we want to generate samples with a density that's proportional to the contribution of those samples to the value of the final integral. (So, where the light is strongest, sample that with highest density.) We can't directly produce samples with this probability density function, though - if we could, we could just compute the integral directly rather than dealing with all this sampling business</p><p>The function being integrated in the rendering equation is the product of a few independent functions:</p><div><ul style="text-align: left;"><li>The BRDF (BSDF) function, which is a property of the material we are rendering,</li><li>The distribution of incoming light. For direct illumination, this is distributed over the relevant light sources</li><li>A geometry term, where the orientation of the surface(s) affects the result</li><li>A visibility term (the point being shaded might be in shadow)</li></ul>The fact that there are a bunch of independent terms means that Multiple Importance Sampling (MIS) works well - we can use these independent functions to produce a single aggregated "target" function which we expect will approximate the <i>real</i> function fairly well. So, we can generate samples according to the target function, using Sequential Importance Resampling (SIR), evaluate the real function at those sampling locations (by tracing rays or whatever), then use Resampled Importance Sampling (RIS) to calculate an integral. Easy peasy, right?</div><div> </div><div><h3 style="text-align: left;">ReSTIR </h3></div><div><br /></div><div>This is where ReSTIR <i>starts</i>. The first observation that ReSTIR makes is that it's possible to use reservoir sampling (RS) to turn this into a <i>streaming</i> algorithm. The paper assumes that the reservoir only holds a single sample (though this isn't actually necessary). The contents of the reservoir represent a set of (one) sample with pdf proportional to the target function, and the more samples the reservoir encounters, the better that pdf matches the target function. The name of the game, now, is to make the reservoir encounter as many samples as possible.</div><div><br /></div><div>Which brings us to the second observation that ReSTIR makes. Imagine if there was some way of merging reservoirs in constant time (or rather: in time proportional to the size of the reservoirs, rather than time proportional to the number of samples the reservoirs have encountered). If this were possible, you could imagine a classic parallel reduction algorithm: each thread (pixel) could start out with a naive reservoir (a poor approximation of your target function), but then adjacent threads could merge their reservoirs, then one-of-every-4-threads could merge their reservoirs, then one-of-every-8, etc, until you have a single result that incorporates results from all the threads. If only a single level (generation) of this reduction occurs each frame, you end up with a result where you perform a constant amount of work each frame per thread, but the result is that an exponential number of samples end up being accumulated. This is the key insight that ReSTIR makes.</div><div> </div><div style="text-align: left;"><h3>Merging Reservoirs</h3></div><div><br /></div><div>Merging reservoirs is a subtle business, though. The problem is that different pixels/threads are shading different materials oriented at different orientations. In effect, your target function you're sampling (and the real function you're evaluating) are different from pixel to pixel. If you ignore this fact, and pretend that all your pixels are all sampling the same thing, you can naively just jam the reservoirs together, by creating a new reservoir which encounters the values saved in the reservoirs of the inputs. This is fast, but gets wrong results (called "bias" in the literature).</div><div><br /></div><div>What you have to do instead is to treat the merging operation with care. The key here lies with the concept of "supports." Essentially, if you're trying to sample a function, you have to be able to generate samples at every place the function is nonzero. If there's an area where the function is nonzero but you never sample that area, your answer will turn out wrong. Well, the samples that one pixel generates (recall that the things in the reservoirs are sample locations) might end up not being applicable to a different pixel. For example, consider if there's an occlusion edge where one pixel is in shadow and a nearby pixel isn't. Or, another example: the surface normal varies across the object, and the sample at one pixel is at a sharp angle, such that if you use that same sample at a different pixel, that sample actually points behind the object. You have to account for this in the formulas involved.</div><div> </div><div style="text-align: left;"><h3>Jacobian Determinant </h3></div><div><br /></div><div>There's a generalization of this, which uses the concept of a <a href="https://en.wikipedia.org/wiki/Jacobian_matrix_and_determinant">Jacobian determinant</a>. Recall that, in general, a function describes a relationship between inputs and outputs. The Jacobian determinant of a function describes, for a particular point in the input space of the function, if you make a small perturbation and feed a slightly different input point into the function, how much the output of the function will be perturbed. It's kind of a measure of sensitivity - at a particular point, how sensitive are changes in the output to changes in the input.</div><div><br /></div><div>Well, if you have a sample at one particular pixel of an image, and you then apply it to a different pixel, you have an input (the sample at the original pixel) and you have an output (the sample at the destination pixel) and you have a relationship between the two (the probability of that sample won't be exactly the same at the two different places). So, the Jacobian tells you how to compensate for the fact that you're changing the domain of the sample.</div><div><br /></div><div>In order to incorporate the Jacobian, you have to be able to calculate it (of course), which means you have to be able to characterize how sample reuse across pixels affects the probabilities involved. For direct illumination, that's just assumed to be 1 or 0 depending on the value of the sample point - hence why above you just ignore some samples altogether when reusing them. For indirect illumination (path tracing), a sample is an entire path, and when you re-use it at a different pixel, you're producing a new path that is slightly different than the original path. This path manipulation is called "shift mapping" of a path in the gradient domain rendering literature, and common shift mappings have well-defined Jacobian functions associated with them. So, if you spatially reuse a path, you can pick a "shift mapping" for how to define the new path, and then include that shift mapping's Jacobian in the reservoir merging formula.</div><div><br /></div><div>This concept of a "shift mapping" and its Jacobian can be generalized to any kind of sampling - it's not just for path tracing.</div><div><br /></div><div><h3 style="text-align: left;">Conclusion</h3></div><div><br /></div><div>So that's kind of it. If you're careful about it, you can merge reservoirs in closed form (or, at least, in closed form for each sample in the reservoirs), which results in a pdf of the values in the reservoir that are informed by the <i>union</i> of samples of all the input reservoirs. This leads to a computation tree of merges which allows the number of samples to be aggregated exponentially over time, where each frame only has to do constant work per pixel. You can perform this reuse both spatially and temporally, if you remember information about the previous frame. The more samples you aggregate, the closer the pdf of the samples in the reservoir matches the target function, and the target function is formed by using MIS to approximate the rendering equation. This allows you to sample with a density very close to the final function you're integrating, which has the effect of reducing variance (noise) in the output image.</div><div><br /></div><div>ReSTIR also has some practical concerns, such as all the reuse causing an echo chamber of old data - the authors deal with that by weighting old and new data differently, to try to strike a balance between reuse (high sample counts) vs quickly adhering to new changes in geometry or whatever. It's a tunable parameter.<br /></div>Litherumhttp://www.blogger.com/profile/12738405376090442005noreply@blogger.com0tag:blogger.com,1999:blog-8778351438463999796.post-24819340349807737112023-10-17T18:34:00.006-07:002023-10-24T17:36:31.964-07:00ReSTIR Part 1: Building Blocks<p>ReSTIR is built on a bunch of other technologies. Let's discuss them one-by-one.</p><h3 style="text-align: left;">Rejection Sampling</h3><div><br /></div><div>Rejection sampling isn't actually used in ReSTIR, but it's useful to cover it anyway. It is a technique to convert samples from one PDF (probability density function) to another PDF.</div><div><br /></div><div>So, you start with the fact that you have 2 PDFs: a source PDF and a destination PDF. The first thing you do is you find a scalar "M" which, when scaling the source PDF, causes the source PDF to be strictly larger than the destination PDF, for all x coordinates. Then, for every sample in the source, accept that sample with a probability equal to destination PDF at the sample / (M * source PDF at the sample). You'll end up with fewer samples than you started with, but that's the price you pay. You can see how the scalar M is necessary to keep the probabilities between 0 and 1.</div><div><br /></div><div>The larger the distance between the destination PDF and M * the source PDF, the fewer samples will be accepted. So, if you pick M very conservatively, you'll end up with almost no samples accepted. That's a downside to rejection sampling.</div><div><br /></div><div>On the other hand, if the source PDF and the destination PDF are the same, then M = 1, and all the samples will be accepted. Which is good, because the input samples are exactly what should be produced by the algorithm.</div><div><br /></div><h3 style="text-align: left;">Sequential Importance Resampling</h3><div><br /></div><div>This is another technique used to convert samples from one PDF to another PDF. Compared to rejection sampling, we don't reject samples as we encounter them; instead, we pick ahead of time how many samples we want to accept.</div><div><br /></div><div>Again, you have a source PDF and a destination PDF. Go through all your samples, and compute a "score" which is the destination PDF at the sample / the source PDF at the sample. Now that you have all your scores, select N samples from them, with probabilities proportional to the scores. You might end up with duplicate samples; that's okay.</div><div><br /></div><div>Compared to rejection sampling, this approach has a number of benefits. The first is that you don't have to pick that "M" value. The scores are allowed to be any (non-negative) value - not necessarily between 0 and 1. This means you don't have to have any global knowledge about the PDFs involved.</div><div><br /></div><div>Another benefit is that you know how many samples you're going to get at the end - you can't end up in a situation where you accidentally don't end up with any samples.</div><div><br /></div><div>The downside to this algorithm is that you have to pick N up front ahead of time. But, usually that's not actually a big deal.</div><div><br /></div><div>The other really cool thing about SIR is that the source and destination PDFs don't actually have to be normalized. Because the scores can be arbitrary, it's okay if your destination PDF is actually just some arbitrary (non-normalized) function. This is super valuable, as we'll see later.</div><div><br /></div><h3 style="text-align: left;">Monte Carlo Integration</h3><div><br /></div><div>The goal of Monte Carlo integration is to compute an integral of a function. You simply sample it at random locations, and average the results.</div><div><br /></div><div>This assumes that the pdf you're using to generate random numbers is constant, from 0 - 1.</div><div><br /></div><div>So, the formula is: 1/N * sum from 1 to N of f(x_i)</div><div><br /></div><h3 style="text-align: left;">Importance Sampling</h3><div><br /></div><div>The idea here is to improve upon basic Monte Carlo integration as described above. Certain samples will contribute to the final result more than others. Instead of sampling from a constant PDF, if you instead sample using a PDF that approximates the function being integrated, you'll more quickly approach the final answer.</div><div><br /></div><div>Doing so adds another term to the formula. It now is: 1/N * sum from 1 to N of f(x_i) / q(x_i), where q(x) is the PDF used to generate samples.</div><div><br /></div><div>The best PDF, of course, is proportional the function being sampled - if you pick this, f(x_i) / q(x_i) will be a constant value for all i, which means you only need 1 term to calculate the final perfect answer. However, usually this is impractical - if you knew how to generate samples proportional to the function being integrated, you probably know enough to just integrate the function directly. For direct illumination, you can use things like the BRDF of the material, or the locations where the light sources are. Those will probably match the final answer pretty well.</div><div><br /></div><h3 style="text-align: left;">Multiple Importance Sampling</h3><div><br /></div><div>So now the question becomes how to generate that approximating function. If you look at the above fomula, you'll notice that when f(x) is large, but q(x) is small, that leads to the worst possible situation - you are trying to compute an integral, but you're not generating any samples in an area that large contributes to it.</div><div><br /></div><div>The other extreme - where f(x) is small but q(x) is big - isn't actually harmful, but it is wasteful. You're generating all these samples that don't actually contribute much to the final answer.</div><div><br /></div><div>The idea behind MIS is that you can generate q(x) from multiple base formulas. For example, one of the base formulas might be the uniform distribution, and another might be proportional to the BRDF of the material you're shading, and another might be proportional to the direction of where the lights in the scene are. The idea is that, by linearly blending all these formulas, you can generate a better q(x) PDF. </div><div><br /></div><div>Incorporating the uniform distribution is useful to make sure that q(x) never gets too small anywhere, thereby solving the problem where f(x) is large and q(x) is small.</div><div><br /></div><h3 style="text-align: left;">Resampled Importance Sampling</h3><div><br /></div><div>RIS is what happens when you bring together importance sampling and SIR. You can use SIR to generate samples proportional to your approximating function. You can then use the importance sampling formula to compute the integral.</div><div><br /></div><div>If, when using SIR, your approximating function isn't normalized, there's another term added into the formula to re-normalize the result, which allows the correct integral to be calculated.</div><div><br /></div><div>This is really exciting, because it means that we can calculate integrals (like the rendering equation) by sampling in strategic places - and the pdf of those strategic places can be arbitrary (non-normalized) functions.</div><div><br /></div><h3 style="text-align: left;">Reservoir Sampling</h3><div><br /></div><div>Reservoir Sampling is a reformulation of SIR, to make it streamable. Recall that, in SIR, you encounter samples, and each sample produces a weight, and then you select N samples proportional to each sample's weight. Reservoir sampling allows you to select the N samples without knowing the total number of samples there are. The idea is that you keep a "reservoir" of N samples, and each time you encounter a new sample, you update the contents of the reservoir depending on probabilities involved. The invariant is that the contents of the reservoir is proportional to the probabilities of all the samples encountered.</div><div><br /></div><div>The other cool thing about reservoir sampling is that 2 reservoirs can be joined together into a single reservoir, via only looking at the contents of the reservoirs, without requiring another full pass over all the data.</div><div><br /></div><h3 style="text-align: left;">Conclusion</h3><div><br /></div><div>So far, we've set ourselves up for success. We can calculate integrals, in a streamable way, by "resampling" our samples to approximate the final function being integrated. Being streamable is important, as we need to be able to update our results as we encounter new samples (perhaps across new frames, or across other pixels). The fact that you can merge reservoirs in constant time is super powerful, as it the merged result to behave as if it saw 2*N samples, while only running a constant-time algorithm. This can be done multiple times, thereby allowing for synthesis of an exponential number of samples, but each operation is constant time.</div>Litherumhttp://www.blogger.com/profile/12738405376090442005noreply@blogger.com0tag:blogger.com,1999:blog-8778351438463999796.post-11718693179345827902023-10-13T03:31:00.007-07:002023-10-13T03:52:46.432-07:00Implementing a GPU's Programming Model on a CPU<h2 style="text-align: left;">SIMT</h2><p>The programming model of a GPU uses what has been coined "single instruction multiple thread." The idea is that the programmer writes their program from the perspective of a single thread, using normal regular variables. So, for example, a programmer might write something like:</p><p><span style="font-family: courier;">int x = threadID;</span></p><p><span style="font-family: courier;">int y = 6;</span></p><p><span style="font-family: courier;">int z = x + y;</span></p><p>Straightforward, right? Then, they ask the system to run this program a million times, in parallel, with different threadIDs.</p><p>The system *could* simply schedule a million threads to do this, but GPUs do better than this. Instead, the compiler will transparently rewrite the program to use vector registers and instructions in order to run multiple "threads" at the same time. So, imagine you have a vector register, where each item in the vector represents a scalar from a particular "thread." In the above, program, x corresponds to a vector of [0, 1, 2, 3, etc.] and y corresponds to a vector of [6, 6, 6, 6, etc.]. Then, the operation x + y is simply a single vector add operation of both vectors. This way, performance can be dramatically improved, because these vector operations are usually significantly faster than if you had performed each scalar operation one-by-one.</p><p>(This is in contrast to SIMD, or "single instruction multiple data," where the programmer explicitly uses vector types and operations in their program. The SIMD approach is suited for when you have a single program that has to process a lot of data, whereas SIMT is suited for when you have many programs and each one operates on its own data.)</p><p>SIMT gets complicated, though, when you have control flow. Imagine the program did something like:</p><p><span style="font-family: courier;">if (threadID < 4) {</span></p><p><span style="font-family: courier;"> doSomethingObservable();</span></p><p><span style="font-family: courier;">}</span></p><p>Here, the system has to behave as-if threads 0-3 executed the "then" block, but also behave as-if threads 4-n didn't execute it. And, of course, thread 0-3 want to take advantage of vector operations - you don't want to pessimize and run each thread serially. So, what do you do?</p><p>Well, the way that GPUs handle this is by using predicated instructions. There is a bitmask which indicates which "threads" are alive: within the above "then" block, that bitmask will have value 0xF. Then, all the vector instructions use this bitmask to determine which elements of the vector it should actually operate on. So, if the bitmask is 0xF, and you execute a vector add operation, the vector add operation is only going to perform the add on the 0th-3rd items in the vector. (Or, at least, it will behave "as-if" it only performed the operation on those items, from an observability perspective.) So, the way that control flow like this works is: all threads actually execute the "then" block, but all the operations in the block are predicated on a bitmask which specifies that only certain items in the vector operations should actually be performed. The "if" statement itself just modifies the bitmask.</p><h2 style="text-align: left;">The Project</h2><p>AVX-512 is an optional instruction set on some (fairly rare) x86_64 machines. The exciting thing about AVX-512 is that it adds support for this predication bitmask thing. It has a bunch of vector registers (512 bits wide, named zmm0 - zmm31) and it also adds a set of predication bitmask registers (k0 - k7). The instructions that act upon the vector registers can be predicated on the value of one of those predication registers, to achieve the effect of SIMT.</p><p>It turns out I actually have a machine lying around in my home which supports AVX-512, so I thought I'd give it a go, and actually implement a compiler that compiles a toy language, but performs the SIMT transformation to use the vector operations and predication registers. The purpose of this exercise isn't really to achieve incredible performance - there are lots of sophisticated compiler optimizations which I am not really interested in implementing - but instead the purpose is really just as a learning exercise. Hopefully, by implementing this transformation myself for a toy language, I can learn more about the kinds of things that real GPU compilers do.</p><p>The toy language is one I invented myself - it's very similar to C, with some syntax that's slightly easier to parse. Programs look like this:</p><p><span style="font-family: courier;">function main(index: uint64): uint64 {</span></p><p><span style="font-family: courier;"> variable accumulator: uint64 = 0;</span></p><p><span style="font-family: courier;"> variable accumulatorPointer: pointer<uint64> = &accumulator;</span></p><p><span style="font-family: courier;"> for (variable i: uint64 = 0; i < index; i = i + 1) {</span></p><p><span style="font-family: courier;"> accumulator = *</span><span style="font-family: courier;">accumulatorPointer </span><span style="font-family: courier;">+ i;</span></p><p><span style="font-family: courier;"> }</span></p><p><span style="font-family: courier;"> return accumulator;</span></p><p><span style="font-family: courier;">}</span></p><p>It's pretty straightforward. It doesn't have things like ++ or +=. It also doesn't have floating-point numbers (which is fine, because AVX-512 supports vector integer operations). It has pointers, for loops, continue & break statements, early returns... the standard stuff.</p><h2 style="text-align: left;">Tour</h2><p>Let's take a tour, and examine how each piece of a C-like language gets turned into AVX-512 SIMT. I implemented this so it can run real programs, and tested it somewhat-rigorously - enough to be fairly convinced that it's generally right and correct.</p><h3 style="text-align: left;">Variables and Simple Math</h3><p>The most straightforward part of this system is variables and literal math. Consider:</p><p><span style="font-family: courier;">variable accumulator: uint64;</span></p><p>This is a variable declaration. Each thread may store different values into the variable, so its storage needs to be a vector. No problem, right?</p><p>What about if the variable's type is a complex type? Consider:</p><p><span style="font-family: courier;">struct Foo {</span></p><p><span style="font-family: courier;"> x: uint64;</span></p><p><span style="font-family: courier;"> y: uint64;</span></p><p><span style="font-family: courier;">}</span></p><p><span style="font-family: courier;">variable bar: Foo;</span></p><p>Here, we need to maintain the invariant that Foo.x has the same memory layout as any other uint64. This means that, rather than alternating x,y,x,y,x,y in memory, there instead has to be a vector for all the threads' x value, followed by another vector for all the threads' y values. This works recursively: if a struct has other structs inside it, the compiler will to through all the leaf-types in the tree, turn each leaf type into a vector, and then lay them out in memory end-to-end.</p><p>Simple math is even more straightforward. Literal numbers have the same value no matter which thread you're running, so they just turn into broadcast instructions. The program says "3" and the instruction that gets executed is "broadcast 3 to every item in a vector". Easy peasy.</p><h3 style="text-align: left;">L-values and R-values</h3><p>In a C-like language, every value is categorized as either an "l-value" or an "r-value". An l-value is defined as having a location in memory, and r-values don't have a location in memory. The value produced by the expression "2 + 3" is an r-value, but the value produced by the expression "*foo()" is an l-value, because you dereferenced the pointer, so the thing the pointer points to is the location in memory of the resulting value. L-values can be assigned to; r-values cannot be assigned to. So, you can say things like "foo = 3 + 4;" (because "foo" refers to a variable, which has a memory location) but you can't say "3 + 4 = foo;". That's why it's called "l-value" and "r-value" - l-values are legal on the left side of an assignment.</p><p>At runtime, every expression has to produce some value, which is consumed by its parent in the AST. E.g, in "3 * 4 + 5", the "3 * 4" has to produce a "12" which the "+" will consume. The simplest way to handle l-values is to make them produce a pointer. This is so expressions like "&foo" work - the "foo" is an lvalue and produces a pointer that points to the variable's storage, and the & operator receives this pointer and produces that same pointer (unmodified!) as an r-value. The same thing happens in reverse for the unary * ("dereference") operator: it accepts an r-value of pointer type, and produces an l-value - which is just the pointer it just received. This is how expressions like "*&*&*&*&*&foo = 7;" work (which is totally legal and valid C!): the "foo" produces a pointer, which the & operator accepts and passes through untouched to the &, which takes it and passes it through untouched, all the way to the final *, which produces the same pointer as an lvalue, that points to the storage of foo.</p><p>The assignment operator knows that the thing on its left side must be an lvalue and therefore will always produce a pointer, so that's the storage that the assignment stores into. The right side can either be an l-value or an r-value; if it's an l-value, the assignment operator has to read from the thing it points to; otherwise, it's an r-value, and the assignment operator reads the value itself. This is generalized to every operation: it's legal to say "foo + 3", so the + operator needs to determine which of its parameters are l-values, and will thus produce pointers instead of values, and it will need to react accordingly to read from the storage the pointers point to.</p><p>All this stuff means that, even for simple programs where the author didn't even spell the name "pointer" anywhere in the program, or even use the * or & operators anywhere in the program, there will still be pointers internally used just by virtue of the fact that there will be l-values used in the program<span style="font-family: inherit;">. So, dealing with pointers is a core part of the language. They appear everywhere, whether the program author wants them to or not.</span></p><h3 style="text-align: left;"><span style="font-family: inherit;">Pointers</span></h3><p><span style="font-family: inherit;">If we now</span> think about what this means for SIMT, l-values produce pointers, but each thread has to get its own distinct pointer! That's because of programs like this:</p><p><span style="font-family: courier;">variable x: pointer<uint64>;</span></p><p><span style="font-family: courier;">if (...) {</span></p><p><span style="font-family: courier;"> x = &something;</span></p><p><span style="font-family: courier;">} else {</span></p><p><span style="font-family: courier;"> x = &somethingElse;</span></p><p><span style="font-family: courier;">}</span></p><p><span style="font-family: courier;">*x = 4;</span></p><p>That *x expression is an l-value. It's not special - it's just like any other l-value. The assignment operator needs to handle the fact that, in SIMT, the lvalue that *x produces is a vector of pointers, where each pointer can potentially be distinct. Therefore, that assignment operator doesn't actually perform a single vector store; instead, it performs a "scatter" operation. There's a vector of pointers, and there's a vector of values to store to those pointers; the assignment operator might end up spraying those values all around memory. In AVX-512, there's an <a href="https://www.felixcloutier.com/x86/vpscatterdd:vpscatterdq:vpscatterqd:vpscatterqq">instruction</a> that does this scatter operation.</p><p>(Aside: That scatter operation in AVX-512 uses a predication mask register (of course), but the instruction has a side-effect of clearing that register. That kind of sucks from the programmer's point of view - the program has to save and restore the value of the register just because of a quirk of this instruction. But then, thinking about it more, I realized that the memory operation might cause a page fault, which has to be handled by the operating system. The operating system therefore needs to know which address triggered the page fault, so it knows which pages to load. The predication register holds this information - as each memory access completes, the corresponding bit in the predication register gets set to false. So the kernel can look at the register to determine the first predication bit that's high, which indicates which pointer in the vector caused the fault. So it makes sense why the operation will clear the register, but it is annoying to deal with from the programmer's perspective.)</p><p>And, of course, the operation can also say "foo = *x;" which means that there also has to be a <a href="https://www.felixcloutier.com/x86/vpgatherqd:vpgatherqq">gather operation</a>. Sure. Something like "*x = *y;" will end up doing both a gather and a scatter.</p><h3 style="text-align: left;">Copying</h3><p>Consider a program like:</p><p><span style="font-family: courier;">struct Foo {</span></p><p><span style="font-family: courier;"> x: uint64;</span></p><p><span style="font-family: courier;"> y: uint64;</span></p><p><span style="font-family: courier;">}</span></p><p><span style="font-family: courier;">someVariableOfFooType = aFunctionThatReturnsAFoo();</span></p><p>That initializer needs to set both fields inside the Foo. Naively, a compiler might be tempted to use a memcpy() to copy the contents - after all, the contents could be arbitrarily complex, with nested structs. However, that won't work for SIMT, because only some of the threads might be alive at this point in the program. Therefore, that assignment has to only copy the items of the vectors for the threads that are alive; it can't copy the whole vectors because that can clobber other entries in the destination vector which are supposed to persist.</p><p>So, all the stores to someVariableOfFooType need to be predicated using the predication registers - we can't naively use a memcpy(). This means that every assignment needs to actually perform n memory operations, where n is the number of leaf types in the struct being assigned - because those memory operations can be predicated correctly using the predication registers. We have to copy structs leaf-by-leaf. This means that the number of instructions to copy a type is proportional to the complexity of the type. Also, both the left side and the right side may be l-values, which means each leaf-copy could actually be a gather/scatter pair of instructions. So, depending on the complexity of the type and the context of the assignment, that single "=" operation might actually generate a huge amount of code.</p><h3 style="text-align: left;">Pointers (Part 2)</h3><p>There's one other decision that needs to be made about pointers: Consider:</p><p><span style="font-family: courier;">variable x: uint64;</span></p><p><span style="font-family: courier;">... &x ...</span></p><p>As I described above, the storage for the variable x is a vector (each thread owns one value in the vector). &x produces a vector of pointers, sure. The question is: should all the pointer values point to the beginning of the x vector? Or should each pointer value point to its own slot inside the x vector? If they point to the beginning, that makes the & operator itself really straightforward: it's just a broadcast instruction. But it also means that the scatter/gather operations get more complicated: they have to offset each pointer by a different amount in order to scatter/gather to the correct place. On the other hand, if each pointer points to its own slot inside x, that means the scatter/gather operations are already set up correctly, but the & operation itself gets more complicated.</p><p>Both options will work, but I ended up making all the pointer point to the beginning of x. The reason for that is for programs like:</p><p><span style="font-family: courier;">struct Foo {</span></p><p><span style="font-family: courier;"> x: uint32;</span></p><p><span style="font-family: courier;"> y: uint64;</span></p><p><span style="font-family: courier;">}</span></p><p><span style="font-family: courier;">variable x: </span><span style="font-family: courier;">Foo</span><span style="font-family: courier;">;</span></p><p><span style="font-family: courier;">... &x ...</span></p><p>If I picked the other option, and had the pointers point to their own slot inside x, it isn't clear which member of Foo they should be pointing inside of. I could have, like, found the first leaf, and made the pointers point into that, but what if the struct is empty... It's not very elegant.</p><p>Also, if I'm assigning to x or something where I need to populate every field, because every copy operation has to copy leaf-by-leaf, I'm going to have to be modifying the pointers to point to each field. If one of the fields is a uint32 and the next one is a uint64, I can't simply just add a constant amount to each pointer to get it to point to its slot in the next leaf. So, if I'm going to be mucking about with individual pointer values for each leaf in a copy operation, I might as well have the original pointer point to the overall x vector rather than individual fields, because pointing to individual fields doesn't actually make anything simpler.</p><h3 style="text-align: left;">Function Calls</h3><p>This language supports function pointers, which are callable. This means that you can write a program like this (taken from the test suite):</p><p><span style="font-family: courier;">function helper1(): uint64 ...</span></p><p><span style="font-family: courier;">function helper2(): uint64 ...</span></p><p><span style="font-family: courier;">function main(index: uint64): uint64 {</span></p><p><span style="white-space: normal;"><span style="font-family: courier;"> variable x: FunctionPointer<uint64>;</span></span></p><p><span style="font-family: courier;"> if (index < 3) {</span></p><p><span style="font-family: courier;"> x = helper1;</span></p><p><span style="font-family: courier;"> } else {</span></p><p><span style="font-family: courier;"> x = helper2;</span></p><p><span style="font-family: courier;"> }</span></p><p><span style="font-family: courier;"> return x();</span></p><p><span style="font-family: courier;">}</span></p><p>Here, that call to x() allows different threads to point to different functions. This is a problem for us, because all the "threads" that are running share the same instruction pointer. We can't actually have some threads call one function and other threads call another function. So, what we have to do instead is to set the predication bitmask to only the "threads" which call one function, then call that function, then set the predication bitmask to the remaining threads, then call the other function. Both functions get called, but the only "threads" that are alive during each call are only the ones that are supposed to actually be running the function.</p><p>This is tricky to get right, though, because anything could be in that function pointer vector. Maybe all the threads ended up with the same pointers! Or maybe each thread ended up with a different pointer! You *could* do the naive thing and do something like:</p><p><span style="font-family: courier;">for i in 0 ..< numThreads:</span></p><p><span style="font-family: courier;"> predicationMask = originalPredicationMask & (1 << i)</span></p><p><span style="font-family: courier;"> call function[i]</span></p><p>But this has really atrocious performance characteristics. This means that every call actually calls numThreads functions, one-by-one. But each one of those functions can have more function calls! The execution time will be proportional to numThreads ^ callDepth. Given that function calls are super common, this exponential runtime isn't acceptable.</p><p>Instead, what you have to do is gather up and deduplicate function pointers. You need to do something like this instead:</p><p><span style="font-family: courier;">func generateMask(functionPointers, target):</span></p><p><span style="font-family: courier;"> mask = 0;</span></p><p><span style="font-family: courier;"> for i in 0 ..< numThreads:</span></p><p><span style="font-family: courier;"> if functionPointers[i] == target:</span></p><p><span style="font-family: courier;"> mask |= 1 << i;</span></p><p><span style="font-family: courier;"> return mask;</span></p><p><span style="font-family: courier;">for pointer in unique(functionPointers):</span></p><p><span style="font-family: courier;"> predicationMask = originalPredicationMask & generateMask(functionPointers, pointer)</span></p><p><span style="font-family: courier;"> call pointer</span></p><p>I couldn't find an instruction in the Intel instruction set that did this. This is also a complicated enough algorithm that I didn't want to write this in assembly and have the compiler emit the instructions for it. So, instead, I wrote it in C++, and had the compiler emit code to call this function at runtime. Therefore, this routine can be considered a sort of "runtime library": a function that automatically gets called when the code the author writes does a particular thing (in this case, "does a particular thing" means "calls a function").</p><p>Doing it this way means that you don't get exponential runtime. Indeed, if your threads all have the same function pointer value, you get constant runtime. And if the threads diverge, the slowdown will be at most proportional to the number of threads. You'll never run a function where the predication bitmask is 0, which means there is a floor about how slow the worst case can be - it will never get worse than having each thread individually diverge from all the other threads.</p><h3 style="text-align: left;">Control Flow</h3><p>As described above, control flow (meaning: if statements, for loops, breaks, continues, and returns) are implemented by changing the value of the predication bitmask register. The x86_64 instruction set has instructions that do this.</p><p>There are 2 ways to handle the predication registers. One way is to observe the fact that there are 8 predication registers, and to limit the language to only allow 8 (7? 6? 3?) levels of nested control flow. If you pick this approach, the code that you emit inside each if statement and for loop would use a different predication register. (Sibling if statements can use the same predication register, but nested ones have to use different predication registers.) </p><p>I elected to not add this restriction, but instead to save and restore the values of the predication register to the stack. This is slower, but it means that control flow can be nested without limit. So, all the instructions I emit are all predicated on the k1 register - I never use k2 - k7 (except - I use k2 to save/restore the value of k1 during scatter/gather operations because those clobber the value of whichever register you pass into it).</p><p>For an "if" statement, you actually need to save 2 predication masks:</p><p></p><ol style="text-align: left;"><li>One that saves the predication mask that was incoming to the beginning of the "if" statement. You need to save this so that, after the "if" statement is totally completed, you can restore it back to what it was originally</li><li>If there's an "else" block, you also need to save the bitmask of the threads that should run the "else" block. You might think that you can compute this value at runtime instead of saving/loading it (it would be the inverse of the threads that ran the "then" block, and-ed with the set of incoming threads) but you actually can't do that because break and continue statements might actually need to modify this value. Consider if there's a break statement as a direct child of the "then" block - at the end of the "then" block, there will be no threads executing (because they all executed the "break" statement). If you then use the set of currently executing threads to try to determine which should execute the "else" block, you'll erroneously determine that all threads (even the ones which ran the "then" block!) should run the "else" block. Instead, you need to compute up-front the set of threads should be running the "else" block, save it, and re-load it when starting to execute the "then" block.</li></ol> For a "for" loop, you also need to save 2 predication masks:<p></p><p></p><ol style="text-align: left;"><li>Again, you need to store the incoming predication mask, to restore it after the loop has totally completed</li><li>You also need to save and restore the set of threads which should execute the loop increment operation at the end of the loop. The purpose of saving and restoring this is so that break statements can modify it. Any thread that executes a break statement needs to remove itself from the set of threads which executes the loop increment. Any thread that executes a continue statement needs to remove itself from executing *until* the loop increment. Again, this is a place where you can't recompute the value at runtime because you don't know which threads will execute break or continue statements.</li></ol><div>If you set up "if" statements and "for" loops as above, then break and continue statements actually end up really quite simple. First, you can verify statically that no statement directly follows them - they should be the last statement in their block.</div><div><br /></div><div>Then, what a break statement does is:</div><div><ol style="text-align: left;"><li>Find the deepest loop it's inside of, and find all the "if" statements between that loop and the break statement</li><li>For each of the "if" statements:</li><ol><li>Emit code to remove all the currently running threads from both of the saved bitmasks associated with that "if" statement. Any thread that executes a break statement should not run an "else" block and should not come back to life after the "if" statement.</li></ol><li>Emit code to remove all the currently running threads from just the second bitmask associated with the loop. (This is the one that gets restored just before the loop increment operation). Any thread that executes a break statement should not execute the loop increment.</li></ol>A "continue" statement does the same thing except for the last step (those threads *should* execute the loop increment). And a "return" statement removes all the currently running threads from all bitmasks from every "if" statement and "for" loop it's inside of.</div><div><br /></div><div>This is kind of interesting - it means an early return doesn't actually stop the function or perform a jmp or ret. The function still continues executing, albeit with a modified predication bitmask, because there might still be some threads "alive." It also means that "if" statements don't actually need to have any jumps in them - in the general case, both the "then" block and the "else" block will be executed, so instead of jumps you can just modify the predication bitmasks - and emit straight-line code. (Of course, you'll want the "then" block and the "else" block to both jump to the end if they find that they start executing with an empty predication bitmask, but this isn't technically necessary - it's just an optimization.)</div><div><br /></div><h3 style="text-align: left;">Shared Variables</h3><div><br /></div><div>When you're using the SIMT approach, one thing that becomes useful is the ability to interact with external memory. GPU threads don't really perform I/O as such, but instead just communicate with the outside world via reading/writing global memory. This is a bit of a problem for SIMT-generated code, because it will assume that the type of everything is vector type - one for each thread. But, when interacting with external memory, all "threads" see the same values - a shared int is just an int, not a vector of ints.</div><div><br /></div><div>That means we now have a 3rd kind of value classification. Previously, we had l-values and r-values, but l-values can be further split into vector-l-values and scalar-l-values. A pointer type now needs to know statically whether it points to a vector-l-value or a scalar-l-value. (This information needs to be preserved as we pass it from l-value pointers through the & and * operators.) In the language, this looks like "pointer<uint64 | shared>".</div><div><br /></div><div>It turns out that, beyond the classical type-checking analysis, it's actually pretty straightforward to deal with scalar-l-values. They are actually strictly simpler than vector-l-values.</div><div><br /></div><div>In the language, you can declare something like:</div><div><br /></div><div><span style="font-family: courier;">variable<shared> x: uint64;</span></div><div><span style="font-family: courier;">x = 4;</span></div><div><br /></div><div>which means that it is shared among all the threads. If you then refer to x, that reference expression becomes a scalar-l-value, and produces a vector of pointers, all of which point to x's (shared) storage. The "=" in the "x = 4;" statement now has to be made aware that:</div><div><ol style="text-align: left;"><li>If the left side is a vector-l-value, then the scatter operation needs to offset each pointer in the vector to point to the specific place inside the destination vectors that the memory operations should write to</li><li>But, if the left side is a scalar-l-value, then no such offset needs to occur. The pointers already point to the one single shared memory location. Everybody points to the right place already.</li></ol>(And, of course, same thing for the right side of the assignment, which can be either a vector-l-value, a scalar-l-value, or an r-value.)</div><div><br /></div><h3 style="text-align: left;">Comparisons and Booleans</h3><div><br /></div><div>AVX-512 of course has <a href="https://www.felixcloutier.com/x86/vpcmpq:vpcmpuq">vector compare instructions</a>. The result of these vector comparisons *isn't* another vector. Instead, you specify one of the bitmask registers to receive the result of the comparison. This is useful if the comparison is the condition of an "if" statement, but it's also reasonable for a language to have a boolean type. If the boolean type is represented as a normal vector holding 0s and 1s, there's an elegant way to convert between the comparison and the boolean.</div><div><br /></div><div>The comparison instructions look like:</div><div><br /></div><div><span style="font-family: courier;">vpcmpleq %zmm1,%zmm0,%k2{%k1}</span></div><div><br /></div><div>If you were to speak this aloud, what you'd say is "do a vector packed compare for less-than-or-equal-to on the quadwords in zmm0 and zmm1, put the result in k2, and predicate the whole operation on the value of k1." Importantly, the operation itself is predicated, and the result can be put into a different predication register. This means that, after you execute this thing, you know which threads executed the instruction (because k1 is still there) but you also know the result of the comparison (because it's in k2).</div><div><br /></div><div>So, what you can do is: use k1 to broadcast a constant 0 into a vector register, and then use k2 to broadcast a constant 1 into the same vector register. This will leave a 1 in all the spots where the test succeeded, and a 0 in all the remaining spots. Pretty cool!</div><div><br /></div><div>If you want to go the other way, to convert from a boolean to a mask, you can just compare the boolean vector to a broadcasted 0, and compare for "not equal." Pretty straightforward.</div><div><br /></div><h3 style="text-align: left;">Miscellanea</h3><div><br /></div><div>I'm using my own calling convention (and ABI) to pass values into and out of functions. It's for simplicity - the x64 calling convention is kind of complicated if you're using vector registers for everything. One of the most useful decisions I made was to formalize this calling convention by encoding it in a C++ class in the compiler. Rather than having various different parts of the compiler just assume they knew where parameters were stored, it was super useful to create a single source of truth about the layout of the stack at call frame boundaries. I ended up changing the layout a few different times, and having this single point of truth meant that such changes only required updating a single class, rather than a global change all over the compiler.</div><div><br /></div><div>Inventing my own ABI also means that there will be a boundary, where the harness will have to call the generated code. At this boundary, there has to be a trampoline, where the contents of the stack gets rejiggered to set it up for the generated code to look in the right place for stuff. And, this trampoline can't be implemented in C++, because it has to do things like align the stack pointer register, which you can't do in C++. AVX-512 requires vectors to be loaded and stored at 64-byte alignment, but Windows only requires 16-byte stack alignments. So, in my own ABI I've said "stack frames are all aligned to 64-byte boundaries" which means the trampoline has to enforce this before the entry point can be run. So the trampoline has to be written in assembly.</div><div><br /></div><div>The scatter/gather operations (which are required for l-values to work) only operate on 32-bit and 64-bit values inside the AVX-512 registers. This means that the only types in the language can be 32-bit and 64-bit types. An AVX-512 vector, which is 512 bits = 64 bytes wide, can hold 8 64-bit values, or 16 32-bit values. However, the entire SIMT programming model requires you to pick up front how many "threads" will be executing at once. If some calculations in your program can calculate 8 values at a time, and some other calculations can calculate 16 values at a time, it doesn't matter - you have to pessimize and only use 8-at-a-time. So, if the language contains 64-bit types, then the max number of "threads" you can run at once is 8. If the language only contains 32-bit types (and you get rid of 64-bit type support, including 64-bit pointers), then you can run 16 "threads" at once. For me, I picked to include 64-bit types and do 8 "threads" at a time, because I didn't want to limit myself to the first 4GB of memory (the natural stack and heap are already farther than 4GB apart from each other in virtual address space, so I'd have to, like, mess with Windows's VM subsystem to allocate my own stack/heap and put them close to each other, and yuck I'll just use 64-bit pointers thankyouverymuch).</div><div><br /></div><h3 style="text-align: left;">Conclusion</h3><div><br /></div><div>And that's kind of it. I learned a lot along the way - there seem to be good reasons why, in many shading languages (which use this SIMT model),</div><div><ul style="text-align: left;"><li>Support for 8-bit and 16-bit types is rare - the scatter/gather operations might not support them.</li><li>Support for 64-bit types is also rare - the smaller your types, the more parallelism you get, for a particular vector bit-width.</li><li>Memory loads and stores turn into scatter/gather operations instead.</li><ul><li>A sophisticated compiler could optimize this, and turn some of them into vector loads/stores instead.</li><li>This might be why explicit support for pointers is relatively rare in shading languages - no pointers means you can _always_ use vector load/store operations instead of scatter/gather operations (I think).</li></ul><li>You can't treat memory as a big byte array and memcpy() stuff around; instead you need to treat it logically and operate on well-typed fields, so the predication registers can do the right thing.</li><li>Shading languages usually don't have support for function pointers, because calling them ends up becoming a loop (with a complicated pointer coalescing phase, no less) in the presence of non-uniformity. Instead, it's easy for the language to just say "You know what? All calls have to be direct. Them's the rules."</li><li>Pretty much every shading language has a concept of multiple address spaces. The need for them naturally arises when you have local variables which are stored in vectors, but you also need to interact with global memory, which every thread "sees" identically. Address spaces and SIMT are thoroughly intertwined.</li><li>I thought it was quite cool how AVX-512 complimented the existing (scalar) instruction set. E.g. all the math operations in the language use vector operations, but you still use the normal call/ret instructions. You use the same rsp/rbp registers to interact with the stack. The vector instructions can still use the SIB byte. The broadcast instruction broadcasts from a scalar register to a vector register. Given that AVX-512 came out of the Larrabee project, it strikes me as a very Intel-y way to build a GPU instruction set.</li></ul></div>Litherumhttp://www.blogger.com/profile/12738405376090442005noreply@blogger.com6tag:blogger.com,1999:blog-8778351438463999796.post-68903035912769824742022-11-09T20:05:00.006-08:002022-12-01T11:01:25.000-08:00Video Splitter<p>I record ~all the games I play. Not for any particular reason, but really just because sometimes it's fun to go back and re-watch them, to repeat a positive experience. Usually, I just upload the raw footage <a href="https://www.youtube.com/user/sylemx/playlists">directly to YouTube</a>, but this time I wanted to see if I could do a little better.</p><h1 style="text-align: left;">Problem Statement</h1><div><div><a href="https://www.youtube.com/watch?v=51cLM3pp0II&list=PLCIcaY7cW31SD1o3XfM24eUQw89ae1r41">I just finished playing Cyberpunk 2077</a>, and recorded 77 hours of footage. This footage is split across 41 video files. Also, when recording the footage, I wasn't particularly meticulous about stopping recording when I had to step away for a little bit. Therefore, there are a bunch of times in the videos when I stepped away for a few minutes, and nothing much is happening on-screen.</div><div><br /></div><div>I want to take these videos, and concatenate them into a few long videos. Ideally the result would be a single video, but YouTube doesn't allow you to upload anything longer than 10 hours, so the result will be a few 10-hour-long videos. Also, I'd like to identify the periods of time in the footage when nothing much is happening, and remove those periods from the result.</div><div><br /></div><div>Also, just for the sake of convenience, I'd like to do this with as few libraries as possible, focusing just on using the software that's built into my Mac.</div></div><div><br /></div><h1 style="text-align: left;">Plan</h1><div><div>There are 3 phases:</div><div><ol style="text-align: left;"><li>Feature Extraction</li><li>Partitioning</li><li>Writing out the final result.</li></ol></div><div>Let's take these one at a time.</div></div><div><br /></div><h2 style="text-align: left;">Feature Extraction</h2><div><div>This is the part where I analyze the videos to pull out useful information from them. This entails going through all the videos frame-by-frame, and mapping each frame to a set of metrics. I ended up using 5 metrics:</div><div><ol style="text-align: left;"><li>The number of words that appear in the frame</li><li>The cross-correlation from the previous frame to the frame in question</li><li>The optical-flow from the previous frame to the frame in question</li><li>The standard deviation of the optical flow</li><li>The average luminance of the frame</li></ol></div><div>Let's consider these one-by-one.</div></div><div><br /></div><h3 style="text-align: left;">Number of Words</h3><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgy3gJEVq-PtduGenorA8CUsVxj16A90K3Y91HsnLnNgyPyk1qFt3XvzJvRxNySz7K6Chqrmxfrk2zkwd3Z2WxTQNiKUktLaSfCSDuiTOK0vT5bcBe5RIY3MSGEwR-uYWyTcvA1mkVechsXJJm_PGbpzb1aUFI4lBGFnKvKIstHxC-84UoRTrbMq_4X2g/s1536/Screenshot%202022-11-09%20at%208.23.02%20PM.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1214" data-original-width="1536" height="506" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgy3gJEVq-PtduGenorA8CUsVxj16A90K3Y91HsnLnNgyPyk1qFt3XvzJvRxNySz7K6Chqrmxfrk2zkwd3Z2WxTQNiKUktLaSfCSDuiTOK0vT5bcBe5RIY3MSGEwR-uYWyTcvA1mkVechsXJJm_PGbpzb1aUFI4lBGFnKvKIstHxC-84UoRTrbMq_4X2g/w640-h506/Screenshot%202022-11-09%20at%208.23.02%20PM.png" width="640" /></a></div><div class="separator" style="clear: both; text-align: center;"><br /></div><div>This is pretty straightforward to gather. Apple's Vision framework contains <a href="https://developer.apple.com/documentation/vision/vnrecognizetextrequest">API to recognize the text in an image</a>. There's one other step beyond simply using this API, though - the results contain strings, but each string may contain many words. I'm interested in the number of words, rather than the number of strings in the image. So, I use another Apple API - <a href="https://developer.apple.com/documentation/corefoundation/cfstringtokenizer-rf8"><span style="font-family: courier;">CFStringTokenizer</span></a> to pull out the words from the string. Then I simply count the number of words. Easy peasy.</div><div><br /></div><h3 style="text-align: left;">Cross Correlation</h3><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhYaaa6hUJxQs6tqdG8lUYeMhTgoBRvDGn2xvBamryo6rKs-6xAYq0ggAzj7CKpy-MgJeVK0BR-qnfqLUycQ3bdCtNvBiJRtkGjSZS-lnkO7GtD0rUp0ebPdaNhooWv6vmf7oYNbKYtLiP4k7t1GXgGYza2epIgZ1P9FVU_KKtSYHi0sTPl-ZlCVT2Xvw/s1526/Screenshot%202022-11-09%20at%208.23.11%20PM.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1224" data-original-width="1526" height="514" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhYaaa6hUJxQs6tqdG8lUYeMhTgoBRvDGn2xvBamryo6rKs-6xAYq0ggAzj7CKpy-MgJeVK0BR-qnfqLUycQ3bdCtNvBiJRtkGjSZS-lnkO7GtD0rUp0ebPdaNhooWv6vmf7oYNbKYtLiP4k7t1GXgGYza2epIgZ1P9FVU_KKtSYHi0sTPl-ZlCVT2Xvw/w640-h514/Screenshot%202022-11-09%20at%208.23.11%20PM.png" width="640" /></a></div><br /><div><br /></div><div><div>The goal here is to cross-correlate adjacent frames of video, to see roughly how much is changing from frame to frame. This one was the most difficult to implement. </div><div><br /></div><div>Cross correlation is usually defined a little differently than what I'm interested in. Classically, cross correlation is a function that takes 2 functions as input, and produces another function. The idea is that you take the 2 functions, multiply them together and integrate the result, to produce a particular value. However, you often want to actually displace the two input functions away from each other. The size of that displacement is why the output of cross-correlation is a function - the input of that function represents the displacement that the two input functions are separated from each other before multiplying and integrating. (I'm assuming here that the functions are real-valued.)</div><div><br /></div><div>For me, my 2 functions are discrete - There are X and Y inputs, and the ouptut is the color value of that pixel. So, instead of multiplying and integrating like you would for continuous functions, this operation actually becomes simply a dot product. I also am using just the luminance of the image, so that if a color changes chroma but luminance stays the same, that still counts as a high cross-correlation. Also, having a 1-dimensional result is a little more convenient than if I had a 3-dimensional result (by treating red, green, and blue as distinct).</div><div><br /></div><div>However, I don't want my output to be a whole function - I just want a single value. Usually, in the sciences, they do this by maximizing the value of the output function - often by trying every input, and reporting the maximum result achievable. This would be great: if I'm turning the camera in the game, this operation would find the location where the previous frame best matches up with the current frame. Unfortunately, it's too slow, though - for a 4K image, there are 35,389,440 inputs to try, and each trial operates on 2 entire 4K images. So, instead, I just set displacement = 0, and assume that adjacent video frames aren't changing a huge amount from frame to frame.</div><div><br /></div><div><div>From reading <a href="https://en.wikipedia.org/wiki/Cross-correlation">Wikipedia's article about cross correlation</a>, it looks like what I want is the "zero normalized cross-correlation" which normalizes the values in the image around the mean, and divides by the standard deviation. The idea is that if the image gets brighter as a whole, but nothing else changes, that should count the same as if it didn't get brighter. It's measuring relative changes, rather than absolute changes.</div></div><div><br /></div><div>So this all boils down to:</div><div><ol style="text-align: left;"><li>Calculate the luminance of both images</li><li>Calculate the average and standard deviation luminance for each image. This ignores geometry and just treats all the pixels as an unordered set.</li><li>For each image, create a new "zero-normalized" image, which is (old image - mean) / standard deviation</li><li>Perform a dot product of the two zero-normalized images. The result of this is a single scalar.</li><li>Divide the scalar by the number of pixels</li></ol></div><div>Okay, how to actually do this in code?</div><div><br /></div><h4 style="text-align: left;">Luminance</h4><div><br /></div><div>Calculating the luminance is actually a little tricky. The most natural way I found was to convert the image into the XYZ colorspace, whose Y channel represents luminance. I'm doing this using <a href="https://developer.apple.com/documentation/metalperformanceshaders/mpsimageconversion"><span style="font-family: courier;">MPSImageConversion</span></a> which can convert images between any 2 color spaces. It operates on <span style="font-family: courier;">MTLTexture</span>s, so I had to bounce through Core Image to actually produce them (via <span style="font-family: courier;">CIRenderDestination</span>.) I then broadcasted the Y channel to every channel of the result, which isn't strictly necessary, but makes it more convenient to use later - I can't forget which channel is the correct channel to use. I did this broadcast using <span style="font-family: courier;">CIFilter.colorMatrix</span>.</div><div><br /></div><h4 style="text-align: left;">Mean</h4><div><br /></div><div>Okay, so now we've got luminance, let's calculate the mean, which is pretty straighforward - CoreImage already has <span style="font-family: courier;">CIFilter.areaAverage</span> which produces a 1x1 image of the average. We can tile that 1x1 image using <span style="font-family: courier;">CIFilter.affineTile</span> so it's as big as the input image.</div><div><br /></div><h4 style="text-align: left;">Subtraction</h4><div><br /></div><div>Subtracting an image from its mean now is actually kind of tricky - Both <span style="font-family: courier;">CIDifferenceBlendMode</span> and <span style="font-family: courier;">CISubtractBlendMode</span> seem to try really hard to not produce negative values. What I had to do in the end was to add the negative of the second image (instead of subtracting). Adding is just <span style="font-family: courier;">CIFilter.additionCompositing</span>, and negating is just <span style="font-family: courier;">CIFilter.multiplyCompositing</span> with an image full of <span style="font-family: courier;">-1</span>s. However, you can't use <span style="font-family: courier;">CIConstantColorGenerator</span> to fill an image with <span style="font-family: courier;">-1</span>s, because that will clamp to <span style="font-family: courier;">0</span>. Instead you have to actually create a new <span style="font-family: courier;">CIImage</span> from a 1x1 CPU-side bitmap buffer that holds a <span style="font-family: courier;">-1</span>, and then use <span style="font-family: courier;">CIFilter.affineTile</span> to make it big enough. Also, in the <span style="font-family: courier;">CIContext</span>, you have to set the working format to one that can represent negative values (normally Core Image uses unsigned values); I'm using <span style="font-family: courier;">CIFormat.RGBAf</span>.</div><div><br /></div><h4 style="text-align: left;">Standard Deviation</h4><div><br /></div><div>Okay, so the next step is to calculate standard deviation. The standard deviation is the square root of variance, and to calculate variance, we take all the pixels, subtract them from the mean like we just did above, square the result, then find the average of the result values. Luckily, we already did all the hard parts - squaring the result is just <span style="font-family: courier;">CIFilter.multiplyCompositing</span>, and we can use the same <span style="font-family: courier;">CIFilter.areaAverage</span> & <span style="font-family: courier;">CIFilter.affineTile</span> to find the average. No problem. We can then take the square root to find standard deviation by using <span style="font-family: courier;">CIFilter.gammaAdjust</span> with a gamma of <span style="font-family: courier;">0.5</span>.</div><div><br /></div><h4 style="text-align: left;">Division</h4><div><br /></div><div>We can't actually divide by the standard deviation using Core Image as far as I can tell - <span style="font-family: courier;">CIDivideBlendMode</span> doesn't seem to do a naive division like we want. However, because the standard deviation is constant across the whole image, we can hoist that division out of the dot product computation. The dot product results in a single scalar, and the standard deviation for an image is a single scalar, so we can just calculate these things independently, and then divide them on the CPU afterwards. No problem.</div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgbzI5yyBPDyCZSCAB9NGfbyGaTuUxEgXy-AMjEt0uhUHV9MTTRrbD6faF1zCRq2vGGl8xa8EIQ2snQgOPWuCt5Mwt5me4JNeB25z47n1Dqn0l1tviZNmzuuYtLQuGQOKdbbLPnjS4K301Vur7SYOEQUVGE9UWX0Z4iLcBxQ_kk30XC3stCDfKSncK5wA/s2723/IMG_0391.heic" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1446" data-original-width="2723" height="341" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgbzI5yyBPDyCZSCAB9NGfbyGaTuUxEgXy-AMjEt0uhUHV9MTTRrbD6faF1zCRq2vGGl8xa8EIQ2snQgOPWuCt5Mwt5me4JNeB25z47n1Dqn0l1tviZNmzuuYtLQuGQOKdbbLPnjS4K301Vur7SYOEQUVGE9UWX0Z4iLcBxQ_kk30XC3stCDfKSncK5wA/w640-h341/IMG_0391.heic" width="640" /></a></div><div><br /></div><h4 style="text-align: left;">Dot Product</h4><div><br /></div><div>Okay, so now we've got our zero-normalized images. Let's do a dot product and average the result! This is pretty easy too - a dot product is just <span style="font-family: courier;">CIFilter.multiplyCompositing</span>, and averaging the result is <span style="font-family: courier;">CIFilter.areaAverage</span>.</div><div><br /></div><div>Phew! All that Core Image work just to get a single value!</div></div><div><br /></div><h3 style="text-align: left;">Optical Flow</h3><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEipF2sArgXg9yAOnrvODbtp_DkeawFhlnQ4Mre7rkDPMJtPVlwGXjGoZysLudNJGKVe37YzGxx5DgSBmTpOFtdoCVKaxSYCO0i5PqnX21roJPfDKnnXWM-QX-0hd6YywqKxxg0GO2C3eTeafrLyIhCmjauXvj_1m2rPlruB2AfaABxlLHoDZ8qxLspJQQ/s1526/Screenshot%202022-11-09%20at%208.23.19%20PM.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1214" data-original-width="1526" height="510" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEipF2sArgXg9yAOnrvODbtp_DkeawFhlnQ4Mre7rkDPMJtPVlwGXjGoZysLudNJGKVe37YzGxx5DgSBmTpOFtdoCVKaxSYCO0i5PqnX21roJPfDKnnXWM-QX-0hd6YywqKxxg0GO2C3eTeafrLyIhCmjauXvj_1m2rPlruB2AfaABxlLHoDZ8qxLspJQQ/w640-h510/Screenshot%202022-11-09%20at%208.23.19%20PM.png" width="640" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiFiuwSQdVPnw7my0t8Li-9wIW0tO1Kvh2epLJngYFchBz-R4D9_JOZloL2eOijeLNiQozqmtFhGxW1TWLQCbkgiigSWZXH1B4Ime9UtMJEOkAtcZ099s3CE8l0_5Eriqv75M5pAce04cUM6EzvdPAmGCGJ-p0oyuGk28n1-5VR-awrep6svhstC8Tr8w/s1528/Screenshot%202022-11-09%20at%208.23.26%20PM.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1220" data-original-width="1528" height="510" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiFiuwSQdVPnw7my0t8Li-9wIW0tO1Kvh2epLJngYFchBz-R4D9_JOZloL2eOijeLNiQozqmtFhGxW1TWLQCbkgiigSWZXH1B4Ime9UtMJEOkAtcZ099s3CE8l0_5Eriqv75M5pAce04cUM6EzvdPAmGCGJ-p0oyuGk28n1-5VR-awrep6svhstC8Tr8w/w640-h510/Screenshot%202022-11-09%20at%208.23.26%20PM.png" width="640" /></a></div><br /><div><br /></div><div><div>Optical flow between 2 images produces an (x, y) displacement vector that indicates, for each pixel in the first image, where it moved to in the second image. I'm interested in this because as I rotate the camera around in the game, that should cause most of those displacements the be pointing in roughly the same direction across the whole image. So, the average of the optical flow should tell me if I'm moving the camera or not.</div><div><br /></div><div>On the other hand, if I'm walking forward in the game, then pixels at the top of the screen will move up, pixels on the right will move more to the right, etc. In that situation, the displacements will all cancel each other out! That's why I'm also interested in the standard deviation of the flow. If the average is 0, but the standard deviation is high, that means I'm walking forward (or backward). If the standard deviation is 0, but the average is high, that means I'm turning the camera.</div><div><br /></div><div>Calculating this is pretty straightforward. Apple's Vision framework contains <a href="https://developer.apple.com/documentation/vision/vngenerateopticalflowrequest">API to calculate optical flow</a>. We can then calculate its average and standard deviation using the same method use used above, in the cross correlation section.</div><div><br /></div><div>Optical flow is a 2-dimensional value - pixels can move in the X and Y dimension. I'm not really interested in the direction of the movement, though; I'm more interested in the amount of movement. So, after calculating the average and the standard deviation, I take the magnitude of them, to turn them into scalar values.</div></div><div><br /></div><h3 style="text-align: left;">Average Luminance</h3><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh8hPYmSwctzmh1NFneOPr6VRDF3ekktu7BTvBwtUQ_ynzN4Vnh7xLEfDJdNe_CgeSc4IlQV1oH6LFYEOH_Aid8U-jUEmiy8hOdlT9xovHPADh0ogXvfJ95mwh26R3el4RA1nX3pj8lyT-itBrPWNufawlwtzctp3Lf3bpzltN510ZyEakgrgewvRsSLw/s1596/Screenshot%202022-11-09%20at%208.23.34%20PM.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1222" data-original-width="1596" height="490" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh8hPYmSwctzmh1NFneOPr6VRDF3ekktu7BTvBwtUQ_ynzN4Vnh7xLEfDJdNe_CgeSc4IlQV1oH6LFYEOH_Aid8U-jUEmiy8hOdlT9xovHPADh0ogXvfJ95mwh26R3el4RA1nX3pj8lyT-itBrPWNufawlwtzctp3Lf3bpzltN510ZyEakgrgewvRsSLw/w640-h490/Screenshot%202022-11-09%20at%208.23.34%20PM.png" width="640" /></a></div><br /><div><br /></div><div>This is pretty straightforward - in fact, we already calculated this above in the cross-correlation section. It's useful because the menus have a black background, so they are darker than regular gameplay. Also, the luminance of menus is very consistent, as opposed to regular gameplay, where luminance is going up and down all the time. So, just looking at a graph of average luminance, you can kind of already see where the menus are in the video.</div><div><br /></div><h2 style="text-align: left;">Partitioning</h2><div><div>Alright, now we've extracted 5 features from each frame of video. Each of these features is a single scalar value. The next task is to try to use statistical analysis to determine which parts of the video are the boring parts that I should cut out, and which parts are full of action and should be kept in.</div><div><br /></div><div>Originally, I thought I could just use cross-correlation to do this. I thought that if the cross-correlation between adjacent frames is low, that means I've cut to a menu or something, and that would be a good place to cut the video up. However, this turned out not to work very well, because menus actually come on screen with an animation, so they don't actually have low cross-correlation. Also, regular gameplay has a bunch of flashes in it (things explode, the camera can turn quickly, visual effects distort the screen, etc.).</div><div><br /></div><div>Instead, I wanted to model the data I had gathered using a piecewise constant function. E.g. during gameplay, the 5 features will adhere to a particular distribution, and during menus or boring parts, the 5 features will adhere to a different distribution. I'm modelling these distributions as normal distributions, but with different means. I'm trying to partition the data by time, and calculate a new normal distribution for each partition, such that each partition's distribution fits its data as well as possible. The trick here is to find the partioning points. I'm looking for a statistical method of finding places where the data is discontiguous.</div><div><br /></div><div>Originally, I thought this would be a classic K-means clustering problem, but after implementing it, it turned out not to work very well. No matter how hard I tried, the partitions overlapped a lot, and looked pretty random. So that didn't work.</div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEijwBq1hUO6m5YYyVxbMcEzr4kFQsshha3Q2CjsDwvvmn6weCmi56DwIT7j4UkIwpdvv3XycfQWVJ28lTlfyTFo8G1ShvmdI40wN4WxkZrQ8S7Ptqkb3Z1GH55druFqfwgh4jjqdery32s_Qb7ed7sdv5G6lbi8Yz_ON1p_sojTLMQswsClkyr31UvQ_Q/s1978/Screenshot%202022-11-09%20at%208.44.58%20PM.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1130" data-original-width="1978" height="366" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEijwBq1hUO6m5YYyVxbMcEzr4kFQsshha3Q2CjsDwvvmn6weCmi56DwIT7j4UkIwpdvv3XycfQWVJ28lTlfyTFo8G1ShvmdI40wN4WxkZrQ8S7Ptqkb3Z1GH55druFqfwgh4jjqdery32s_Qb7ed7sdv5G6lbi8Yz_ON1p_sojTLMQswsClkyr31UvQ_Q/w640-h366/Screenshot%202022-11-09%20at%208.44.58%20PM.png" width="640" /></a></div><br /><div><br /></div><div><br /></div><h3 style="text-align: left;">Bayesian Information Criterion</h3><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjuqa8-vsT7Ya9t9jPJKoyg2deajaio099BBsq5iZelritWl5tNOlT6Etcb2UMrx2-jfZJ88jBwI9QDswMP7W-jRaXh5VWPsgWCYIzIaFiE8KtbZ-aUPFDykKShhNsXVDzvOREefVBBl-uKS4rq8mm21H7PP94XZndOt1UM9GRzzPog9_TJU5xszfp9vA/s1626/Screenshot%202022-11-09%20at%208.30.24%20PM.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1248" data-original-width="1626" height="492" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjuqa8-vsT7Ya9t9jPJKoyg2deajaio099BBsq5iZelritWl5tNOlT6Etcb2UMrx2-jfZJ88jBwI9QDswMP7W-jRaXh5VWPsgWCYIzIaFiE8KtbZ-aUPFDykKShhNsXVDzvOREefVBBl-uKS4rq8mm21H7PP94XZndOt1UM9GRzzPog9_TJU5xszfp9vA/w640-h492/Screenshot%202022-11-09%20at%208.30.24%20PM.png" width="640" /></a></div><br /><div><br /></div><div>Next, I discovered something called the <a href="https://en.wikipedia.org/wiki/Bayesian_information_criterion">Bayesian information criterion</a> (BIC). This comes from the science of model selection, which is essentially exactly what I'm trying to do. I have a bunch of data, and I have a family of models in mind (disjoint normal distributions, one for each partition) and I'm trying to select among the family for the best model for my data.</div><div><br /></div><div>The BIC is a measure of how well data fits a model. Importantly, though, it tries to mitigate overfitting. For example, in my data, if every data point was in its own partition, then the model would perfectly fit the data; but this wouldn't actually solve my problem. The Bayesian information criterion has 2 terms - one that reflects how well the data fits the model, and another one that reflects how many parameters the model has. The better the data fits the model, the better the BIC; but the more parameters in the model, the worse the BIC. It's essentially a way of balancing fitting vs overfitting.</div><div><br /></div><div>There are 2^n different ways of partitioning n values, and for me, n is the number of frames in 77 hours of video. So exhaustively calculating the BIC for every possible partitioning is clearly too expensive. I instead opted to use a greedy approach. Given a particular partitioning, we can try to add a new split at every location (which is <span style="font-family: courier;">O(n)</span> locations), and calculate the BIC as-if we split at that position. We then pick the location which results in the best BIC, and add it into the partitioning. We keep doing this until we find there is no single new splitting location which will improve the BIC (because adding another split would overfit the model).</div><div><br /></div><div>If you preprocess the data to precalculate prefix-sums, you can answer "sum all the values from here to there" queries in <span style="font-family: courier;">O(1)</span> time. This allows you to compute the BIC for a single partition in <span style="font-family: courier;">O(1)</span> time (if you expand out the polynomial in the <a href="https://en.wikipedia.org/wiki/Bayesian_information_criterion#Gaussian_special_case">formula</a>).</div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiBiuMPK9R_COKIAfB1mAZfnBKNIQ6rsMYPVJxBdWdTCWT5_IdRVJBHvO8tHyLTH7uGAz6fKgPHJEAUqX-0-7NtCxzuVn26ZGhMXVnDdapgqQBTT-ohhLdjFTm8BblCVnTN3mKkfQACxOXrcWVAHYjUVK3n8nXZX_jPFsA_9wxxaSUlMrlu8HtQ38whxw/s2783/IMG_0392.heic" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="2599" data-original-width="2783" height="598" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiBiuMPK9R_COKIAfB1mAZfnBKNIQ6rsMYPVJxBdWdTCWT5_IdRVJBHvO8tHyLTH7uGAz6fKgPHJEAUqX-0-7NtCxzuVn26ZGhMXVnDdapgqQBTT-ohhLdjFTm8BblCVnTN3mKkfQACxOXrcWVAHYjUVK3n8nXZX_jPFsA_9wxxaSUlMrlu8HtQ38whxw/w640-h598/IMG_0392.heic" width="640" /></a></div><div><br /></div><div>Therefore, determining the BIC for a particular partitioning is <span style="font-family: courier;">O(number of partitions)</span>. Every time we pick a splitting point, the number of partitions increases by <span style="font-family: courier;">1</span>. We want to calculate a BIC once for each candidate splitting point, which is <span style="font-family: courier;">O(</span><span style="font-family: courier;">number of partitions</span><span style="font-family: courier;"> * number of frames)</span>. Even better, calculating the BIC at every candidates splitting point is an embarassingly parallel problem, and we can parallelize this across all the cores in our machine. It turns out this is fast enough - the longest single video I had is 5 hours long, and this algorithm completed on that video in around 10 minutes on my 20-core machine.</div><div><br /></div><div>There are 2 tweaks that I ended up having to do to the above:</div><div><ol style="text-align: left;"><li>The BIC assumes that the data you have is one-dimensional. However, my data is 5-dimensional; I extracted 5 features from each frame of the video, so each data point has 5 components. There may be a way to generalize the BIC to higher dimensions, but I instead opted to do something similar: for each 5-dimensional data point, create a single one-dimensional synthetic data point that represents it. This is meaningless, but isn't without precedent: there are lots of examples of people using this approach to average together multiple benchmarks into a single synthetic score. I opted to combine the 5 values using the geometeric mean, because that is sensitive to ratios of the data points, rather than the magnitudes of the data points themselves. (Arithmetic mean is definitely wrong, because calculating the arithmetic mean involves adding together the 5 values, but the 5 values have different units, so they can't be added.)</li><ol><li>I also decided to use <span style="font-family: courier;">1 - cross correlation</span> instead of cross correlation itself, because all the other values have a baseline near 0, but cross correlation has a baseline near 1 (because most frames are similar to their adjacent frames). This makes all the values behave a bit more similarly, and makes ratios more meaningful.</li></ol><li>The BIC formula involves a tradeoff between fitting the data and overfitting the data. The way overfitting is measured is that it's proportional to the number of parameters in the model. For me, each different partition has (I believe) 2 parameters: the mean of the data within that partition, and the constant variance of the errors. However, using this value causes the data to be overfitted significantly, so I instead multiplied that by 10, which ended up with a pretty good result. Doing it this way ended up with the average partition in each video being around 30 - 60 seconds long, regardless of the length of the video being partitioned. So that's a pretty cool result.</li></ol><h2 style="text-align: left;">Writing out the Result</h2></div></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgX615SoRfvKWntEucVZ-aBaD08xkqbHdUixk1Xx_uKFU6gcpzi81QOH4SQEWTsh1ZO85-wQA-Ai-Z9ViDBdWZ7fKAnfYiwimuT-OjPqWQY8uQzsBXkfwQStXv7Xpd852QhCkYZvRIUAGTmMCVMRHkh7wEwyavZEOs0lqvsazqOQngaPMPEdFCLE5GbwA/s2470/Screenshot%202022-11-09%20at%208.40.12%20PM.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1588" data-original-width="2470" height="412" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgX615SoRfvKWntEucVZ-aBaD08xkqbHdUixk1Xx_uKFU6gcpzi81QOH4SQEWTsh1ZO85-wQA-Ai-Z9ViDBdWZ7fKAnfYiwimuT-OjPqWQY8uQzsBXkfwQStXv7Xpd852QhCkYZvRIUAGTmMCVMRHkh7wEwyavZEOs0lqvsazqOQngaPMPEdFCLE5GbwA/w640-h412/Screenshot%202022-11-09%20at%208.40.12%20PM.png" width="640" /></a></div><br /><div><br /></div><div><div>The last step is to pick a threshold: for each partition, I have to determine whether that partition should be present in the final video. I already have a model, where each partition is associated with the average value of the synthetic one-dimensional calculated values inside that partition. The way I picked which regions were in and which were out was just by inspecting a bunch of partitions, to see what was happening in the video during that time. I found that, if the average synthetic value is less than around <span style="font-family: courier;">0.12145</span>, then the partition was boring and should not be included.</div><div><br /></div><div>Writing the result is pretty straightforward - I'm using <span style="font-family: courier;">AVAssetWriter</span> and passing it frames from the input file (which was read using <span style="font-family: courier;">AVAssetReader</span>).</div></div><h1 style="text-align: left;">Results</h1><div><div>All this work seems, somewhat surprisingly, to give pretty good results. Not perfect, but better than I was expecting. In the first few hours of gameplay that I reviewed, it neatly cut out:</div><div><ol style="text-align: left;"><li>A section when I was reading an in-game lore book thing (it was just showing text on the screen for a minute or so)</li><li>A section when the game was paused</li><li>A section when I was fussing with my inventory for a few minutes</li><li>A section when I was looking at my character in the mirror</li></ol></div><div>The remarkable part of this is that it cut out all these things wholesale: from right when they started, to right when they end. So you see the character walk into the elevator, and then they're immediately walking out of the elevator. And it didn't cut out any of the action or interesting parts. It also seems to have some resistance to durations of events: there was a point when I read a different in-game lore book, but I only read it for a few seconds, and it kept that part in; - presumably because it wasn't worth another partition.</div></div>Litherumhttp://www.blogger.com/profile/12738405376090442005noreply@blogger.com1tag:blogger.com,1999:blog-8778351438463999796.post-15673396232695469582021-05-12T23:41:00.003-07:002021-05-17T23:33:29.934-07:00Understanding CVDisplayLink<p>I found it actually somewhat difficult to understand how to use <span style="font-family: courier;">CVDisplayLink</span>. But, after a while of playing around with it, I think I've got a pretty good handle on it. It's not too complicated.</p><p>The main use of a <span style="font-family: courier;">CVDisplayLink </span>is to have a callback that runs once per vsync of a screen. It's also stateful, so you can stop and start the callback stream.</p><h3 style="text-align: left;">Creation</h3><p>When you create one of these objects, you have to tell the system which screen to match - because different screens can have different refresh rates. The type <span style="font-family: courier;">CVDisplayLink </span>accepts to do this is <span style="font-family: courier;">CGDirectDisplayID</span>. You can get this from an <span style="font-family: courier;">NSScreen*</span> as follows:</p><p><span style="font-family: courier;">NSDictionary<NSDeviceDescriptionKey, id> *deviceDescription = theScreen.deviceDescription;</span></p><p><span style="font-family: courier;">NSNumber *directDisplayIDNumber = deviceDescription[@"NSScreenNumber"];</span></p><p><span style="font-family: courier;">CGDirectDisplayID directDisplay = directDisplayIDNumber.unsignedIntValue;</span></p><p>Then, you can use <span style="font-family: courier;">CVDisplayLinkCreateWithCGDisplay()</span> to create the object for that display.</p><h3 style="text-align: left;">Setup</h3><p>The setup can be either a block or a C function. The block doesn't need a <span style="font-family: courier;">void* userInfo</span> object because that context is implicitly captured by the block. So, you just say:</p><p><span style="font-family: courier;">CVDisplayLinkSetOutputHandler(displayLink, ^CVReturn (CVDisplayLinkRef displayLink, const CVTimeStamp *inNow, const CVTimeStamp *inOutputTime, CVOptionFlags flagsIn, CVOptionFlags *flagsOut) {</span></p><p><span style="font-family: courier;"> ...</span></p><p><span style="font-family: courier;"> return kCVReturnSuccess;</span></p><p><span style="font-family: courier;">});</span></p><p>And then you start it with just <span style="font-family: courier;">CVDisplayLinkStart(displayLink);</span> Easy peasy. There are also functions for stopping, retaining, and releasing the <span style="font-family: courier;">CVDisplayLink</span>.</p><h3 style="text-align: left;">Interpreting the Arguments</h3><p>It actually took me quite a while to figure out what each of the arguments means. The docs say that <span style="font-family: courier;">flagsIn </span>and <span style="font-family: courier;">flagsOut </span>are 0, and the <span style="font-family: courier;">displayLink </span>is the <span style="font-family: courier;">CVDisplayLink </span>that you started, so there are only really two interesting arguments: <span style="font-family: courier;">inNow</span>, and <span style="font-family: courier;">inOutputTime</span>, both of which are of type <span style="font-family: courier;">CVTimeStamp</span>. <span style="font-family: courier;">inNow </span>represents the time that this callback is being run, and <span style="font-family: courier;">inOutputTime </span>represents the time that anything you draw in the callback is supposed to show up at.</p><p>So, let's dig into <span style="font-family: courier;">CVTimeStamp</span>. The <span style="font-family: courier;">version </span>and <span style="font-family: courier;">reserved </span>fields are 0, and the flags field <a href="https://developer.apple.com/documentation/corevideo/cvtimestampflags?language=objc">tells you</a> which of the fields in the <span style="font-family: courier;">CVTimeStamp </span>are valid. I don't know what SMPTE time is, but it never seems to be set/valid, so I'm going to ignore that one. So these are the ones that are remaining:</p><p></p><ul style="text-align: left;"><li><span style="font-family: courier;">hostTime</span></li><li><span style="font-family: courier;">rateScalar</span></li><li><span style="font-family: courier;">videoRefreshPeriod</span></li><li><span style="font-family: courier;">videoTime</span></li><li><span style="font-family: courier;">videoTimeScale</span></li></ul><p></p><p>The thing you have to realize is that there are two timelines happening concurrently: "host" time and "video" time. So, a "point" in time actually has two different representations: one for each of the timelines.</p><p>The <span style="font-family: courier;">hostTime </span>field uses the same tick count that <span style="font-family: courier;">mach_absolute_time()</span> uses. To convert it to seconds, you have to use <span style="font-family: courier;">mach_timebase_info()</span>. And, the "meaning" of the <span style="font-family: courier;">hostTime </span>field is the current time as measured by your application - exactly what <span style="font-family: courier;">mach_absolute_time()</span> returns.</p><p>The <span style="font-family: courier;">videoTime </span>field does not use those same tick counts. Instead, it uses the <span style="font-family: courier;">videoTimeScale </span>field. It's a rational number: <span style="font-family: courier;">videoTime </span>/ <span style="font-family: courier;">videoTimeScale </span>= seconds. <span style="font-family: courier;">videoRefreshPeriod </span>is a rational number too, using the same denominator, but it represents the delta between adjacent video frames.</p><p>For <span style="font-family: courier;">CVDisplayLink</span>, the "video" time represents time as measured by vsyncs. You can think of vsyncs as an independent clock - it ticks every so-often, and those ticks don't have to be in exact cadence with any of the other clocks on the system. They're supposed to be, but when you actually measure them, they won't perfectly line up, because of course nothing is that perfect. So, if <span style="font-family: courier;">videoRefreshPeriod </span>/ <span style="font-family: courier;">videoTimeScale </span>equals 1/60, and you record adjacent frames' <span style="font-family: courier;">hostTime </span>and convert them to seconds using <span style="font-family: courier;">mach_timebase_info()</span>, you'll get something that's close to 1/60, but it won't be exact, because nothing is ever that exact all the time.</p><p>So that's what <span style="font-family: courier;">rateScalar </span>tries to measure. It's the only field that is floating point, and it measures the speed of the video timeline relative to the speed of the host timeline. Ideally, it would always be 1.0, but, of course, nothing is ever that perfect. It's not sensitive to workload, just as time doesn't dilate when you start asking your computer to do some work.</p><p>The video time is time based on vsyncs, not time based on the window server render loop or the core animation render loop. If some other application loads up a big Core Animation scene, your <span style="font-family: courier;">CVDisplayLink </span>isn't going to tick slower.</p><p>Also, I assume the fact that <span style="font-family: courier;">videoRefreshPeriod </span>is passed into each callback indicates that videos can change their refresh rate ... but I'm not sure.</p>Litherumhttp://www.blogger.com/profile/12738405376090442005noreply@blogger.com1tag:blogger.com,1999:blog-8778351438463999796.post-92233128342377579852019-03-07T22:26:00.002-08:002019-03-07T22:44:17.952-08:00Addition Font<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjgxdfjoWM1n1tNeEoun7jw1SWH-wOtVM79cRwa7BCoAT9Q2znYRKeg18W352OartZUCEW_VEn2plVA9BcfdDFZWefagJ0MADa5gIAQEHSGtdcwcPGyxVmb_I-ZFDq6QLeXDXzc5uJGHjM1/s1600/Screen+Shot+2019-03-07+at+10.34.48+PM.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="471" data-original-width="1587" height="94" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjgxdfjoWM1n1tNeEoun7jw1SWH-wOtVM79cRwa7BCoAT9Q2znYRKeg18W352OartZUCEW_VEn2plVA9BcfdDFZWefagJ0MADa5gIAQEHSGtdcwcPGyxVmb_I-ZFDq6QLeXDXzc5uJGHjM1/s320/Screen+Shot+2019-03-07+at+10.34.48+PM.png" width="320" /></a></div>
<br />
<h3>
Shaping</h3>
<br />
When laying out text using a font file, the code points in the string are mapped one-to-one to glyphs inside the file. A glyph is a little picture inside the font file, and is identified by ID, which is a number from 0 - 65535. However, there’s a step after character mapping but before rasterization: shaping.<br />
<br />
For example, one application of shaping is seen in Arabic text. In Arabic, each letter has four different forms, depending on which letters are next to it. For example, two letters in their isolated form look like ف and ي but as soon as you put them together, they form new shapes and look like في. This type of modification isn’t possible if characters are naively mapped to glyphs and then rasterized directly. Instead, there needs to be a step in the middle to modify the glyph forms so the correct thing is rasterized.<br />
<br />
This “middle step” is called shaping, and is implemented by three tables inside the OpenType font file: GSUB, GPOS, and GDEF. Let’s consider GSUB alone.<br />
<br />
<h3>
GSUB</h3>
<br />
The GSUB table, or “glyph substitution” table, is designed to let font authors replace glyphs with other glyphs. It describes a transformation where the input is a sequence of glyphs and the output is a different sequence of glyphs. It is made up of a collection of constituent “lookup tables,” each of which has a “type.”<br />
<br />
Type 1 (“single substitution”) provides a map from glyph to glyph. This is used for example, when someone enables the ‘swsh’ feature, the font can substitute out the ampersand with a fancy ampersand. In that situation, the map would contain a mapping from the regular ampersand to the fancy ampersand (possibly in addition to some more additional mappings, too).<br />
<br />
Type 2 (“multiple substitution”) provides a map from glyph to sequence-of-glyphs. This is used, for example, if diacritic (accent) marks are represented as separate glyphs inside the font. The font can replace the “è” glyph with the “e" glyph followed by the ◌̀ glyph (and then the GPOS table later can position the two glyphs physically on top of each other).<br />
<br />
Type 4 (“ligature substitution”) provides a map of sequence-of-glyphs to single glyph (the opposite of type 2). This is used for ligatures, so if you have a fancy “<span style="font-family: "zapfino";">ffi</span>” ligature, you can represent all three of those letters in the same fancy glyph.<br />
<br />
Type 5 (“contextual substitution”) is special. It doesn’t do any replacements directly, but instead maps a sequence of glyphs to a list of other tables that should be applied at specific points in the glyph sequence. So it can say things like “in the glyph sequence ‘abcde’, apply table #7 at index 2, and then when you’re done with that, apply table #13 at index 4.” Tables #7 and #13 can be any of the types above, so you could use this table to say something like “swap out the ‘d’ for an ‘f’, but only if it appears in the sequence ‘abcde’.” This sort of thing is used to implement the “contextual alternates” feature.<br />
<br />
There are also three other types, but they’re not particularly relevant, so I’m going to ignore them.<br />
<br />
So, the inputs to the text system are a set of features and an input string of glyphs (The characters have already been mapped to glyphs via the “cmap” table). Features are mapped to a set of lookup tables, each of which is of a type listed above. Each of those lookup tables describes a map where the keys are sequences of glyphs, so the runtime iterates through the glyph sequence until it finds a sequence that’s an input to one of the tables. The runtime then performs that glyph replacement according to the rules of the tables, and continues iterating.<br />
<br />
<h3>
Turing Complete</h3>
<br />
So this is pretty cool, but it turns out that the contextual substitution lookup type is really powerful. This is because the table that it references can be itself, which means it can be recursive.<br />
<br />
Let’s pretend we have a lookup table named 42 (presumably because it’s the 42nd lookup table inside the font), and it’s a contextual substitution lookup table. This table maps glyph sequences to tuples of (lookup table to recurse to, offset in glyph sequence to recurse). Let’s say we design it with these two mappings:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">Table42 {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> A A : (Table42, 1);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> A B : (Table100, 1);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<br />
If the runtime is operating on a glyph sequence of “<span style="font-family: "courier new" , "courier" , monospace;">AAAAB</span>”, the first two “<span style="font-family: "courier new" , "courier" , monospace;">AA</span>”s will match the first rule, so then the system recurses and runs <span style="font-family: "courier new" , "courier" , monospace;">Table42</span> on the stream “<span style="font-family: "courier new" , "courier" , monospace;">AAAB</span>”. Then these first two “<span style="font-family: "courier new" , "courier" , monospace;">AA</span>”s will match, and so-on. This happens until you get to the end of the string, “<span style="font-family: "courier new" , "courier" , monospace;">AB</span>” matches, and then <span style="font-family: "courier new" , "courier" , monospace;">Table100</span> is run on the string “<span style="font-family: "courier new" , "courier" , monospace;">B</span>”.<br />
<br />
This is “tail recursion,” and can be used to implement all different types of loops. Also, each mapping in the table acts as an “if” statement because it only executes if the pattern is matched.<br />
<br />
You can use the glyph stream as memory by reading and writing to it; that is, after all, what the shaping algorithm is designed to do. You can delete a glyph by using Type 2 to map it to an empty sequence. You can insert a glyph by using Type 2 to map the preceding glyph to a sequence of [itself, the new glyph you want to insert]. And, once you’ve inserted a glyph, you can check for its presence by using the “if” statements described above.<br />
<br />
So thats a pretty powerful virtual machine. I think the above is sufficient to prove Turing complete-ness.<br />
<br />
<h3>
Caveat</h3>
<br />
So it turns out the example (“<span style="font-family: "courier new" , "courier" , monospace;">Table42</span>”) above doesn’t actually work in DirectWrite. This is because, in DirectWrite, inner matches have to be entirely contained within outer matches. So when the outer call to <span style="font-family: "courier new" , "courier" , monospace;">Table42</span> matched “<span style="font-family: "courier new" , "courier" , monospace;">AA</span>”, the inner call to <span style="font-family: "courier new" , "courier" , monospace;">Table42</span> can only match within that specific “<span style="font-family: "courier new" , "courier" , monospace;">AA</span>”. This means it’s impossible to, for example, find the first even glyphID and move it to the beginning. So, DirectWrite’s implementation isn’t Turing complete. However, it does work in HarfBuzz and CoreText, so those implementations are Turing complete.<br />
<br />
But even in HarfBuzz and CoreText, there are hard limits on the recursion depth. HarfBuzz sets its limit to 6. Therefore, the above example will only work on strings of length 7 or fewer. HarfBuzz is open source, though, so I simply used a custom build of HarfBuzz which bumps up this limit to 4 billion. This let me recurse to my heart’s content. A limit of 6 is probably a good thing; I don’t think users generally expect their text engines to be running arbitrary computation during layout. But I want to go beyond it.<br />
<br />
<h3>
DSL</h3>
<br />
After making the above realizations, I decided to try to implement a nontrivial algorithm using only the GSUB table in a font. I wanted to try to implement addition. The input glyph stream would be of the form “<span style="font-family: "courier new" , "courier" , monospace;">=1234+5678=</span>” and the shaping process would turn that string into “<span style="font-family: "courier new" , "courier" , monospace;">6912</span>”.<br />
<br />
When thinking about how to do this, I started jotting down some ideas on paper for what the lookup tables should be, and the things I were writing down were very similar to that “<span style="font-family: "courier new" , "courier" , monospace;">Table42</span>” example above. However, writing down tables of types 1, 2, and 4 was quite cumbersome, because what I really wanted to describe were things like “move this glyph to the beginning of the sequence” rather than individual insertions or deletions.<br />
<br />
I looked at the “fea” language, which is how these lookups are traditionally written by font designers. However, after reading the <a href="https://github.com/fonttools/fonttools/blob/cbd099522446a6815ae4015294a77b7c69788270/Lib/fontTools/feaLib/parser.py">parser</a>, it looks like it doesn’t support recursive or mutually-recursive lookups. So, I rolled my own.<br />
<br />
So, I did what any good programmer does in this situation: I invented a new domain-specific language.<br />
<br />
The DSL has two types of statements. The first is a way of giving a set of glyphs a name. I wanted to be able to address all of the digits without having to write out every individual digit. So, there’s a statement that looks like this:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">digit: 1 2 3 4 5 6 7 8 9 10;</span><br />
<br />
Note that those numbers are glyph IDs, not code points. In my specific font, the “0” character is mapped to glyph 1, “1” is mapped to glyph 2, etc.<br />
<br />
Then, you can describe a lookup using the syntax above:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">digitMove {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> digit digit: (1, digitMove), (1, digitMove2);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> digit plus: (1, digitMove2);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<br />
The lookup is named <span style="font-family: "courier new" , "courier" , monospace;">digitMove</span>, and it has two rules inside it. The first rule matches any two digits next to each other, and if they match, it invokes the lookup named <span style="font-family: "courier new" , "courier" , monospace;">digitMove</span> (which is the name of the current lookup, so this is recursive) at index 1, and then after that, invokes the lookup named <span style="font-family: "courier new" , "courier" , monospace;">digitMove2</span> at index 1.<br />
<br />
Each of these stanzas gets translated fairly trivially to lookup with type 5.<br />
<br />
These rules are recursive, as above, but they need a terminal form so that the recursion will eventually end. Those are described like this:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">flagPayload {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> digit digit plus digit: flag \3 \0 \1 \2;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<br />
This rule is a terminal because there are no parentheses on the right side of the colon. The right side represents a glyph sequence to replace the match with. The values on the right side are either a literal glyph ID, a glyph set that contains exactly one glyph, or a backreference which starts with a backslash. “<span style="font-family: "courier new" , "courier" , monospace;">\3</span>” means “whatever glyph was third glyph in the matched sequence”. So the rule above would turn the glyph sequence “<span style="font-family: "courier new" , "courier" , monospace;">34+5</span>” into the sequence “<span style="font-family: "courier new" , "courier" , monospace;">F534+</span>”. (The flag glyph is read and removed in a later stage of processing).<br />
<br />
Translating one of these rules to lookups is nontrivial. I tried a few things, but ended up with the following design:<br />
<br />
For each output glyph, right to left:<br />
<br />
<ul>
<li>If it’s a backreference, duplicate the glyph it’s referencing, and perform a sequence of swaps to move the new glyph all the way to the right.</li>
<li>If it’s a literal glyph, insert it at the beginning, and perform a sequence of swaps to move it all the way to the right.</li>
</ul>
<br />
This means we need to have a way to do the following operations:<br />
<br />
<ul>
<li>Duplicate. This is a type 4 lookup that maps every glyph to a sequence of two of that glyph.</li>
<li>Swap. This has two pieces: a type 5 lookup that has a rule for every combination of two glyphs, and each rule maps each glyph to a lookup of type 1 which replaces it with the appropriate glyph. This means you need n of these inner (type 1) lookups, allowing you to map any glyph to any other glyph. However, the encoding format allows us to encode each of these inner lookups in constant space in the font, so these inner lookups don’t take that much space. Instead, the outer type 5 lookup takes n^2 space.</li>
<li>Insert a literal. If you implemented this by simply making a type 2 that mapped every glyph to that same glyph + the literal, you would need n^2 space because there would be n of these tables. Instead, you can cut down the size by doing it in two phases: inserting a flag glyph (which is O(n) space using a lookup type 2) and mapping that glyph to any constant value (also O(n) space using a type 1).</li>
</ul>
<br />
Above, I’m worried about space constraints in the file because pointers in the file are (in general) 2 bytes, meaning the maximum size that anything can be is 2^16. If n^2 space is needed, that means n can only be 2^8 = 256, which isn’t that big. Most fonts have on the order of 256 glyphs. Therefore, we need to reduce the places where we require O(n^2) space as much as possible. LookupType 7 helps somewhat, because it allows you to use 32-bit pointers in on specific place, but it only helps that one place.<br />
<br />
My font only has 14 glyphs in it, so i didn’t end up near any of these limits, but it’s still important to watch out for out-of-bounds problems.<br />
<br />
So, given all that, we can make a parser which builds an AST for the language, and we can build an intermediate representation which represents the bytes in the file, and we can make a lowering phase which lowers the AST to the IR. Then we can serialize the IR and write out the data to the file.<br />
<br />
<h3>
Addition</h3>
<br />
So, once the language was up and running, I had to actually write a program that represented addition. It works in four phases.<br />
<br />
First, define some glyphs:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">digit: 1 2 3 4 5 6 7 8 9 10;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">flag: 13;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">plus: 11;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">equals: 12;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">digit0: 1;</span><br />
<br />
Then, parse the string. If the string didn’t match the form “<span style="font-family: "courier new" , "courier" , monospace;">=digits+digits=</span>” then I wanted nothing to happen. You can do this by recursing across the string, and if you find that it matches the pattern, insert a flag, and then when all the calls return, move the flag leftward.<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">parse {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> equals digit: (1, parseLeft), (0, afterParse);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">parseLeft {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> digit digit: (1, parseLeft), (0, moveFlagAcrossDigit);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> digit plus digit: (2, parseRight), (0, moveFlagAcrossPlus);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">parseRight {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> digit digit: (1, parseRight), (0, moveFlagAcrossDigit);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> digit equals: flag \0 \1;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">moveFlagAcrossDigit {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> digit flag digit: \1 \0 \2;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">moveFlagAcrossPlus {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> digit plus flag digit: \2 \0 \1 \3;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">afterParse {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> equals flag: (0, removeFlag), (0, startDigitMove);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">removeFlag {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> equals flag: \0;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<br />
The next step is to pair up the glyphs. For example, this would turn “<span style="font-family: "courier new" , "courier" , monospace;">1234+5678</span>” into “<span style="font-family: "courier new" , "courier" , monospace;">15263748+</span>”.<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">startDigitMove {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> equals digit: (1, digitMove), (1, startPhase2), (0, removeEquals);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">removeEquals {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> equals digit: \1;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">digitMove {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> digit digit: (1, digitMove), (1, digitMove2);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> digit plus: (1, digitMove2);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">digitMove2 {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> digit digit: (1, digitMove2), (0, swapDigits);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> digit plus digit: (2, digitMove2), (0, digitMove3);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> digit plus equals: digit0 \0 \1 \2;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> plus digit: (1, digitMove2), (0, swapPlusDigit);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> plus equals: digit0 \0 \1;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> digit equals: \0 \1;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">swapDigits {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> digit digit: \1 \0;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">digitMove3 {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> digit plus digit: \2 \0 \1;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">swapPlusDigit {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> plus digit: \1 \0;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<br />
The next step is to see if there were any glyphs left over on the right side that didn’t get moved. This happens if the right side is longer than the left side. For example, if the input string is “<span style="font-family: "courier new" , "courier" , monospace;">12+3456</span>” we now would have “<span style="font-family: "courier new" , "courier" , monospace;">1526+34</span>”. We want to turn this into “<span style="font-family: "courier new" , "courier" , monospace;">03041526</span>”<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">startPhase2 {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> digit digit: (0, phase2), (0, beginPhase3);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">phase2 {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> digit digit: (0, phase2Move), (0, checkPayload);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">phase2Move {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> digit digit digit digit: (2, phase2Move), (0, movePayload);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> digit digit plus equals: \0 \1 \2 \3;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> digit digit plus digit: (3, payload), (0, flagPayload);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">payload {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> digit digit: (1, payload), (0, swapDigits);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> digit equals: \0 \1;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">flagPayload {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> digit digit plus digit: flag \3 \0 \1 \2;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">movePayload {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> digit digit flag digit: \2 \3 \0 \1;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">checkPayload {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> flag digit digit digit: (0, rearrangePayload), (0, phase2);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">rearrangePayload {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> flag digit digit digit: digit0 \1 \2 \3;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<br />
The last step is to actually perform the addition. This works like a ripple carry adder. We want to take the glyphs two-at-a-time, and add them, and produce a carry. Then the next pair of glyphs will add, and include the carry. We start the process by introducing a carry = 0.<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">beginPhase3 {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> digit digit: (0, phase3), (0, removeZero);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">phase3 {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> digit digit digit digit: (2, phase3), (0, addPair);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> digit digit plus equals: (0, insertCarry), (0, addPair);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">insertCarry {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> digit digit plus equals: \0 \1 digit0;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">removeZero {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> digit0 digit: (1, removeZero), (0, removeSingleZero);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">removeSingleZero {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> digit0 digit: \1;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">addPair {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> 1 1 1: 1 1;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> 1 1 2: 1 2;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> 1 2 1: 1 2;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> 1 2 2: 1 3;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> 1 3 1: 1 3;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> 1 3 2: 1 4;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> … more here</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<br />
<h3>
HarfBuzz</h3>
<br />
So I’m doing this whole thing using the HarfBuzz shaper, as described above. This is because it’s open source, so I can find where I’m hitting limits and increase the limits. It turned out that I not only had to increase the <span style="font-family: "courier new" , "courier" , monospace;">HB_MAX_NESTING_LEVEL</span> to <span style="font-family: "courier new" , "courier" , monospace;">4294967294</span>, but I also was running into more limits. I ended up just taking all the limits in <span style="font-family: "courier new" , "courier" , monospace;">hb-buffer.hh</span>, <span style="font-family: "courier new" , "courier" , monospace;">hb-machinery.hh</span>, and <span style="font-family: "courier new" , "courier" , monospace;">hb-ot-layout-common.hh</span> and increasing them by a factor of 10.<br />
<br />
There’s one more piece that was necessary to get it to work. Inside <span style="font-family: "courier new" , "courier" , monospace;">apply_lookup()</span> in <span style="font-family: "courier new" , "courier" , monospace;">hb-ot-layout-gsubgpos.hh</span>, there’s a section <span style="font-family: "courier new" , "courier" , monospace;">if (end <= int (match_positions[idx]))</span>. It looks to me like this section is detecting if a recursive call caused the glyph sequence to get shorter than the size of the match. Inside this block, it says <span style="font-family: "courier new" , "courier" , monospace;">/* There can't be any further changes. */ break;</span> which seems to stop the recursion (which seems incorrect to me, but I’m not a HarfBuzz developer, so I could be wrong). In order to get this whole system to work, I had to comment out the “<span style="font-family: "courier new" , "courier" , monospace;">break</span>” statement.<br />
<br />
So that’s it! After doing that, the system works, and the font correctly adds numbers. The font has 75 shaping rules and is 32KB large.<br />
<br />
The glyph paths (contours) were taken from from the <a href="https://fontlibrary.org/en/font/retroscape">Retroscape</a> font.<br />
<br />
<a href="https://drive.google.com/file/d/14oNfNM1aYmgq3r8f50SmIHWQHz08HMWx/view?usp=sharing">Font file download</a>Litherumhttp://www.blogger.com/profile/12738405376090442005noreply@blogger.com69tag:blogger.com,1999:blog-8778351438463999796.post-23763444695768263792019-03-02T19:04:00.000-08:002019-03-02T19:10:38.976-08:00Wide color vs HDROver the past few years, there’s been something of a renaissance in display technology. It started with <a href="https://en.wikipedia.org/wiki/Retina_display">retina displays</a> and now is extending to <a href="https://developer.apple.com/videos/play/wwdc2016/712/">wide color</a> and <a href="https://support.microsoft.com/en-us/help/4040263/windows-10-hdr-advanced-color-settings">HDR</a>. Wide color has been added to Apple’s devices, and HDR support has arrived in Windows. These are similar technologies, but they aren’t the same.<br />
<br />
HDR allows you to display the same colors you could display with out, but at a higher luminosity (colloquially: brightness). This is sort of like the difference between one red lightbulb and two red lightbulbs. Looking at two red lightbulbs doesn’t change the color of the red, but instead it’s just brighter.<br />
<br />
Wide color, on the other hand, lets you see colors that weren’t possible to see before. It is possible to display colors that are more saturated than otherwise could have been.<br />
<br />
HDR monitors use the same color primaries as non-HDR monitors, but the luminosity of each of those primaries can grow beyond 1.0. On the other hand, wide color monitors use different, more saturated primaries.<br />
<br />
<canvas height="400" id="canvas2" width="400"></canvas>
<script id="vertexShader" type="x-shader/x-vertex">
uniform mat4 modelViewProjectionMatrix;
attribute vec3 position;
attribute vec3 color;
varying vec3 outColor;
void main() {
gl_Position = modelViewProjectionMatrix * vec4(position, 1);
outColor = color;
}
</script>
<script id="fragmentShader" type="x-shader/x-fragment">
precision mediump float;
varying vec3 outColor;
void main() {
gl_FragColor = vec4(outColor, 1);
}
</script>
<script>
function crossProduct(u, v) {
return [u[1] * v[2] - u[2] * v[1], u[2] * v[0] - u[0] * v[2], u[0] * v[1] - u[1] * v[0]];
}
function unitVector(v) {
var length = Math.sqrt(Math.pow(v[0], 2) + Math.pow(v[1], 2) + Math.pow(v[2], 2));
return [v[0] / length, v[1] / length, v[2] / length];
}
function constructLookAtMatrix(eyePosition, centerPosition, upVector) {
var f = [centerPosition[0] - eyePosition[0], centerPosition[1] - eyePosition[1], centerPosition[2] - eyePosition[2]];
var fUnit = unitVector(f);
var upUnit = unitVector(upVector);
var s = crossProduct(fUnit, upUnit);
var sUnit = unitVector(s);
var u = crossProduct(sUnit, fUnit);
return DOMMatrix.fromMatrix({
m11: s[0], m21: s[1] , m31: s[2] , m41: -eyePosition[0],
m12: u[0], m22: u[1] , m32: u[2] , m42: -eyePosition[1],
m13: -fUnit[0], m23: -fUnit[1], m33: -fUnit[2], m43: -eyePosition[2],
m14: 0 , m24: 0 , m34: 0 , m44: 1,
});
}
function constructPerspectiveMatrix(yFieldOfView, aspectRatio, nearPlaneDistance, farPlaneDistance) {
var f = 1 / Math.tan(yFieldOfView / 2);
var m11 = f / aspectRatio
var m22 = f;
var m33 = (farPlaneDistance + nearPlaneDistance) / (nearPlaneDistance - farPlaneDistance);
var m43 = (2 * farPlaneDistance * nearPlaneDistance) / (nearPlaneDistance - farPlaneDistance);
return DOMMatrix.fromMatrix({
m11: m11, m21: 0 , m31: 0 , m41: 0 ,
m12: 0 , m22: m22, m32: 0 , m42: 0 ,
m13: 0 , m23: 0 , m33: m33, m43: m43,
m14: 0 , m24: 0 , m34: -1 , m44: 1 ,
});
}
function constructModelViewProjectionMatrix(aspectRatio, angle) {
var modelToWorldMatrix = new DOMMatrix();
modelToWorldMatrix.rotateAxisAngleSelf(0, 1, 0, angle);
modelToWorldMatrix.translateSelf(-0.5, -0.5, -0.5);
var worldToCameraMatrix = constructLookAtMatrix([0, 0.15, 1], [0, 0, 0], [0, 1, 0]);
var projectionMatrix = constructPerspectiveMatrix(0.8 * (Math.PI / 2), aspectRatio, 0.1, 10);
return projectionMatrix.multiply(worldToCameraMatrix).multiply(modelToWorldMatrix);
}
function matrixToSequence(matrix) {
// Column major
return [matrix.m11, matrix.m12, matrix.m13, matrix.m14,
matrix.m21, matrix.m22, matrix.m23, matrix.m24,
matrix.m31, matrix.m32, matrix.m33, matrix.m34,
matrix.m41, matrix.m42, matrix.m43, matrix.m44];
}
function start() {
let canvas = document.getElementById("canvas2");
let context = canvas.getContext("webgl");
let vertexShaderElement = document.getElementById("vertexShader");
let fragmentShaderElement = document.getElementById("fragmentShader");
var vertexShader = context.createShader(context.VERTEX_SHADER);
context.shaderSource(vertexShader, vertexShaderElement.text);
context.compileShader(vertexShader);
let compiled = context.getShaderParameter(vertexShader, context.COMPILE_STATUS);
let fragmentShader = context.createShader(context.FRAGMENT_SHADER);
context.shaderSource(fragmentShader, fragmentShaderElement.text);
context.compileShader(fragmentShader);
compiled = context.getShaderParameter(fragmentShader, context.COMPILE_STATUS);
let program = context.createProgram();
context.attachShader(program, vertexShader);
context.attachShader(program, fragmentShader);
context.linkProgram(program);
let linked = context.getProgramParameter(program, context.LINK_STATUS);
let sRGBVertices = new Float32Array([
0, 0, 0, 0, 1, 0, // left front
0, 0, 0, 1, 0, 0, // bottom front
1, 0, 0, 1, 1, 0, // right front
0, 1, 0, 1, 1, 0, // top front
0, 0, 1, 0, 1, 1, // left back
0, 0, 1, 1, 0, 1, // bottom back
1, 0, 1, 1, 1, 1, // right back
0, 1, 1, 1, 1, 1, // top back
0, 1, 0, 0, 1, 1, // top left
0, 0, 0, 0, 0, 1, // bottom left
1, 1, 0, 1, 1, 1, // top right
1, 0, 0, 1, 0, 1 // bottom right
]);
let sRGBVertexBuffer = context.createBuffer();
context.bindBuffer(context.ARRAY_BUFFER, sRGBVertexBuffer);
context.bufferData(context.ARRAY_BUFFER, sRGBVertices, context.STATIC_DRAW);
let sRGBColors = new Float32Array([
0, 0, 0, 0, 1, 0, // left front
0, 0, 0, 1, 0, 0, // bottom front
1, 0, 0, 1, 1, 0, // right front
0, 1, 0, 1, 1, 0, // top front
0, 0, 1, 0, 1, 1, // left back
0, 0, 1, 1, 0, 1, // bottom back
1, 0, 1, 1, 1, 1, // right back
0, 1, 1, 1, 1, 1, // top back
0, 1, 0, 0, 1, 1, // top left
0, 0, 0, 0, 0, 1, // bottom left
1, 1, 0, 1, 1, 1, // top right
1, 0, 0, 1, 0, 1 // bottom right
]);
let sRGBColorBuffer = context.createBuffer();
context.bindBuffer(context.ARRAY_BUFFER, sRGBColorBuffer);
context.bufferData(context.ARRAY_BUFFER, sRGBColors, context.STATIC_DRAW);
let p3Vertices = new Float32Array([
0, 0, 0, -0.225, 1.042, -0.079, // left front
0, 0, 0, 1.225, -0.042, -0.02, // bottom front
1.225, -0.042, -0.02, 1.0, 1.0, -0.098, // right front
-0.225, 1.042, -0.079, 1.0, 1.0, -0.098, // top front
0.0, 0.0, 1.098, -0.225, 1.042, 1.019, // left back
0.0, 0.0, 1.098, 1.225, -0.042, 1.078, // bottom back
1.225, -0.042, 1.078, 1, 1, 1, // right back
-0.225, 1.042, 1.019, 1, 1, 1, // top back
-0.225, 1.042, -0.079, -0.225, 1.042, 1.019, // top left
0, 0, 0, 0.0, 0.0, 1.098, // bottom left
1.0, 1.0, -0.098, 1, 1, 1, // top right
1.225, -0.042, -0.02, 1.225, -0.042, 1.078 // bottom right
]);
let p3VertexBuffer = context.createBuffer();
context.bindBuffer(context.ARRAY_BUFFER, p3VertexBuffer);
context.bufferData(context.ARRAY_BUFFER, p3Vertices, context.STATIC_DRAW);
let p3Colors = new Float32Array([
1, 1, 1, 1, 1, 1, // left front
1, 1, 1, 1, 1, 1, // bottom front
1, 1, 1, 1, 1, 1, // right front
1, 1, 1, 1, 1, 1, // top front
1, 1, 1, 1, 1, 1, // left back
1, 1, 1, 1, 1, 1, // bottom back
1, 1, 1, 1, 1, 1, // right back
1, 1, 1, 1, 1, 1, // top back
1, 1, 1, 1, 1, 1, // top left
1, 1, 1, 1, 1, 1, // bottom left
1, 1, 1, 1, 1, 1, // top right
1, 1, 1, 1, 1, 1 // bottom right
]);
let p3ColorBuffer = context.createBuffer();
context.bindBuffer(context.ARRAY_BUFFER, p3ColorBuffer);
context.bufferData(context.ARRAY_BUFFER, p3Colors, context.STATIC_DRAW);
let hdrVertices = new Float32Array([
0.0, 0.0, 0.0, -0.8406297072082757, 1.9076076353654263, -0.052351971805095654, // left front
0.0, 0.0, 0.0, 2.554871967813559, -0.09029165014876057, -0.026618440567143256, // bottom front
2.554871967813559, -0.09029165014876057, -0.026618440567143256, 1.7142421158939605, 1.817316070188582, -0.07897041518986225, // right front
-0.8406297072082757, 1.9076076353654263, -0.052351971805095654, 1.7142421158939605, 1.817316070188582, -0.07897041518986225, // top front
0.040655566745996685, -0.05532626436054705, 1.839225462436676, -0.799974313187599, 1.8522814843535422, 1.7868734956026078, // left back
0.040655566745996685, -0.05532626436054705, 1.839225462436676, 2.5955275093317036, -0.14561788636446005, 1.8126070237517355, // bottom back
2.5955275093317036, -0.14561788636446005, 1.8126070237517355, 1.7548976064920434, 1.7619898903012274, 1.7602550538778303, // right back
-0.799974313187599, 1.8522814843535422, 1.7868734956026078, 1.7548976064920434, 1.7619898903012274, 1.7602550538778303, // top back
-0.8406297072082757, 1.9076076353654263, -0.052351971805095654, -0.799974313187599, 1.8522814843535422, 1.7868734956026078, // top left
0.0, 0.0, 0.0, 0.040655566745996685, -0.05532626436054705, 1.839225462436676, // bottom left
1.7142421158939605, 1.817316070188582, -0.07897041518986225, 1.7548976064920434, 1.7619898903012274, 1.7602550538778303, // top right
2.554871967813559, -0.09029165014876057, -0.026618440567143256, 2.5955275093317036, -0.14561788636446005, 1.8126070237517355, // bottom right
]);
let hdrVertexBuffer = context.createBuffer();
context.bindBuffer(context.ARRAY_BUFFER, hdrVertexBuffer);
context.bufferData(context.ARRAY_BUFFER, hdrVertices, context.STATIC_DRAW);
let hdrColors = new Float32Array([
0, 1, 1, 0, 1, 1, // left front
0, 1, 1, 0, 1, 1, // bottom front
0, 1, 1, 0, 1, 1, // right front
0, 1, 1, 0, 1, 1, // top front
0, 1, 1, 0, 1, 1, // left back
0, 1, 1, 0, 1, 1, // bottom back
0, 1, 1, 0, 1, 1, // right back
0, 1, 1, 0, 1, 1, // top back
0, 1, 1, 0, 1, 1, // top left
0, 1, 1, 0, 1, 1, // bottom left
0, 1, 1, 0, 1, 1, // top right
0, 1, 1, 0, 1, 1 // bottom right
]);
let hdrColorBuffer = context.createBuffer();
context.bindBuffer(context.ARRAY_BUFFER, hdrColorBuffer);
context.bufferData(context.ARRAY_BUFFER, hdrColors, context.STATIC_DRAW);
let laptopVertices = new Float32Array([
0.0, 0.0, 0.0, -0.0002792076006530794, 0.7211763429544866, 0.0006004979923367598, // left front
0.0, 0.0, 0.0, 0.7263884289287031, 0.00030994328223168564, 0.00013657837584614765, // bottom front
0.7263884289287031, 0.00030994328223168564, 0.00013657837584614765, 0.7261091228932143, 0.7214863152667879, 0.0007370786458253953, // right front
-0.0002792076006530794, 0.7211763429544866, 0.0006004979923367598, 0.7261091228932143, 0.7214863152667879, 0.0007370786458253953, // top front
0.0016182385280728573, -0.0010248487599194105, 0.7190324349924921, 0.0013389428615571686, 0.7201515444993972, 0.7196329519808291, // left back
0.0016182385280728573, -0.0010248487599194105, 0.7190324349924921, 0.7280066970169545, -0.0007149118453264602, 0.7191690410017967, // bottom back
0.7280066970169545, -0.0007149118453264602, 0.7191690410017967, 0.7277272919297222, 0.7204615152657031, 0.7197694932579994, // right back
0.0013389428615571686, 0.7201515444993972, 0.7196329519808291, 0.7277272919297222, 0.7204615152657031, 0.7197694932579994, // top back
-0.0002792076006530794, 0.7211763429544866, 0.0006004979923367598, 0.0013389428615571686, 0.7201515444993972, 0.7196329519808291, // top left
0.0, 0.0, 0.0, 0.0016182385280728573, -0.0010248487599194105, 0.7190324349924921, // bottom left
0.7261091228932143, 0.7214863152667879, 0.0007370786458253953, 0.7277272919297222, 0.7204615152657031, 0.7197694932579994, // top right
0.7263884289287031, 0.00030994328223168564, 0.00013657837584614765, 0.7280066970169545, -0.0007149118453264602, 0.7191690410017967, // bottom right
]);
let laptopVertexBuffer = context.createBuffer();
context.bindBuffer(context.ARRAY_BUFFER, laptopVertexBuffer);
context.bufferData(context.ARRAY_BUFFER, laptopVertices, context.STATIC_DRAW);
let laptopColors = new Float32Array([
1, 0, 1, 1, 0, 1, // left front
1, 0, 1, 1, 0, 1, // bottom front
1, 0, 1, 1, 0, 1, // right front
1, 0, 1, 1, 0, 1, // top front
1, 0, 1, 1, 0, 1, // left back
1, 0, 1, 1, 0, 1, // bottom back
1, 0, 1, 1, 0, 1, // right back
1, 0, 1, 1, 0, 1, // top back
1, 0, 1, 1, 0, 1, // top left
1, 0, 1, 1, 0, 1, // bottom left
1, 0, 1, 1, 0, 1, // top right
1, 0, 1, 1, 0, 1 // bottom right
]);
let laptopColorBuffer = context.createBuffer();
context.bindBuffer(context.ARRAY_BUFFER, laptopColorBuffer);
context.bufferData(context.ARRAY_BUFFER, laptopColors, context.STATIC_DRAW);
context.useProgram(program);
let vertexBufferAttribLocation = context.getAttribLocation(program, "position");
context.enableVertexAttribArray(vertexBufferAttribLocation);
let colorAttribLocation = context.getAttribLocation(program, "color");
context.enableVertexAttribArray(colorAttribLocation);
let modelViewProjectionMatrixLocation = context.getUniformLocation(program, "modelViewProjectionMatrix");
//let colorLocation = context.getUniformLocation(program, "color");
context.lineWidth(3);
context.clearColor(0, 0, 0, 1);
context.enable(context.DEPTH_TEST);
context.enable(context.BLEND);
context.blendFunc(context.SRC_ALPHA, context.ONE_MINUS_SRC_ALPHA);
let theta = 0;
let offsetX;
let offsetY;
let dragging = false;
function onMouseDrag(event) {
var newX = event.offsetX;
var newY = event.offsetY;
var deltaX = newX - offsetX;
var deltaY = newY - offsetY;
theta += deltaX;
offsetX = newX;
offsetY = newY;
}
function onMouseUp() {
dragging = false;
canvas.removeEventListener("mousemove", onMouseDrag, false);
canvas.removeEventListener("mouseup", onMouseUp, false);
mouseDragListener = undefined;
mouseUpListener = undefined;
}
function onMouseDown(event) {
if (!dragging) {
dragging = true;
mouseDragListener = canvas.addEventListener("mousemove", onMouseDrag, false);
mouseUpListener = canvas.addEventListener("mouseup", onMouseUp, false);
offsetX = event.offsetX;
offsetY = event.offsetY;
}
}
canvas.addEventListener("mousedown", onMouseDown, false);
function draw(timeDelta, aspectRatio) {
context.clear(context.COLOR_BUFFER_BIT | context.DEPTH_BUFFER_BIT);
if (!dragging)
theta += timeDelta * 0.05;
let modelViewProjectionMatrix = matrixToSequence(constructModelViewProjectionMatrix(aspectRatio, theta));
context.useProgram(program);
context.bindBuffer(context.ARRAY_BUFFER, sRGBVertexBuffer);
context.vertexAttribPointer(vertexBufferAttribLocation, 3, context.FLOAT, false, 0, 0);
context.bindBuffer(context.ARRAY_BUFFER, sRGBColorBuffer);
context.vertexAttribPointer(colorAttribLocation, 3, context.FLOAT, false, 0, 0);
context.uniformMatrix4fv(modelViewProjectionMatrixLocation, false, modelViewProjectionMatrix);
//context.uniform4fv(colorLocation, [1, 0, 0, 1]);
context.drawArrays(context.LINES, 0, 24);
context.bindBuffer(context.ARRAY_BUFFER, p3VertexBuffer);
context.vertexAttribPointer(vertexBufferAttribLocation, 3, context.FLOAT, false, 0, 0);
context.bindBuffer(context.ARRAY_BUFFER, p3ColorBuffer);
context.vertexAttribPointer(colorAttribLocation, 3, context.FLOAT, false, 0, 0);
context.uniformMatrix4fv(modelViewProjectionMatrixLocation, false, modelViewProjectionMatrix);
//context.uniform4fv(colorLocation, [0, 1, 0, 1]);
context.drawArrays(context.LINES, 0, 24);
context.bindBuffer(context.ARRAY_BUFFER, hdrVertexBuffer);
context.vertexAttribPointer(vertexBufferAttribLocation, 3, context.FLOAT, false, 0, 0);
context.bindBuffer(context.ARRAY_BUFFER, hdrColorBuffer);
context.vertexAttribPointer(colorAttribLocation, 3, context.FLOAT, false, 0, 0);
context.uniformMatrix4fv(modelViewProjectionMatrixLocation, false, modelViewProjectionMatrix);
//context.uniform4fv(colorLocation, [0, 0, 1, 1]);
context.drawArrays(context.LINES, 0, 24);
context.bindBuffer(context.ARRAY_BUFFER, laptopVertexBuffer);
context.vertexAttribPointer(vertexBufferAttribLocation, 3, context.FLOAT, false, 0, 0);
context.bindBuffer(context.ARRAY_BUFFER, laptopColorBuffer);
context.vertexAttribPointer(colorAttribLocation, 3, context.FLOAT, false, 0, 0);
context.uniformMatrix4fv(modelViewProjectionMatrixLocation, false, modelViewProjectionMatrix);
//context.uniform4fv(colorLocation, [0, 0, 1, 1]);
context.drawArrays(context.LINES, 0, 24);
}
let previousTime;
function tick(time) {
if (previousTime == undefined)
previousTime = time;
let aspectRatio = canvas.clientWidth / canvas.clientHeight;
context.viewport(0, 0, canvas.clientWidth, canvas.clientHeight);
draw(time - previousTime, aspectRatio);
previousTime = time;
window.requestAnimationFrame(tick);
}
window.requestAnimationFrame(tick);
let error = context.getError();
}
window.addEventListener("load", start);
</script>
<br />
Click and drag to rotate!
<br />
The colorful cube is sRGB, normalized to the luminosity of an iPad Pro screen. The white lines describe the gamut of an iPad Pro screen using the Display-P3 color space. The light blue describes the gamut of an ASUS ROG PG27UQ monitor, which is both HDR and wide color. The purple describes the gamut of a SurfaceBook laptop. The coordinate system is XYZ, but transformed such that sRGB is a unit cube.<br />
<br />
In the above diagram, luminosity is roughly equivalent to distance in the +X+Y+Z direction. The chroma (hue and saturation) of a point is roughly the angle between two lines, one of which goes through the origin and the point, and the other goes through the origin and pure white. Therefore, wider colors are characterized by the three primary axes pointing in more opposite directions, whereas luminosity is roughly how far those lines extend.<br />
<br />
You can see this above. The black and white points are shared between sRGB and Display P3, but the Display P3 monitor can show more points around the middle. So it isn’t more luminous, but it is wider. The ASUS monitor is both wide and HDR, so its axes open up widely, and also extend very far. A monitor that’s HDR but not wide would have the same primaries as sRGB, but would extend out far like the ASUS monitor.<br />
<br />
Luminosity isn’t only tangentially related to color; in fact, each color has exactly one luminosity value. If you take a color and convert it to the XYZ color space, the Y component is luminosity. So, an HDR monitor can show colors with Y components significantly larger than non HDR monitors. A wide color monitor can’t, but it can show colors with X and Z values other than the values non-wide monitors can.<br />
<br />
This is kind of interesting, because the sRGB spec says that its white point is defined to be 80 nits (which is the unit of luminosity). However, over the decades, monitors have gotten brighter, presumably because psychologically, consumers prefer to buy brighter displays than dimmer displays. Nowadays, most monitors are around 200-300 nits. Therefore, if you strictly adhere to the spec, an sRGB color value (r, g, b) should be some particular point in XYZ space, but in practice, because everyone bought brighter monitors, those same color values (r, g, b) are actually a point with a much greater Y value in XYZ. So different displays have different primaries, but they also have a different luminosity, which affects how far away from 0 the white point is in sRGB. You can see this in the above diagram - the SurfaceBook’s maximum white point is significantly smaller than the color cube, which is because the Surface Book reports a luminosity of only 270 nits. The diagram above is normalized to the luminosity of an iPad pro, which is <a href="https://www.laptopmag.com/articles/ipad-pro-10-inch-upgrade">measured</a> by laptopmag.com to be 368 nits.<br />
<br />
You can get all this information on Windows by using the <a href="https://docs.microsoft.com/en-us/windows/desktop/api/DXGI1_6/nf-dxgi1_6-idxgioutput6-getdesc1">IDXGIOutput6::GetDesc1()</a> API call. This call gives you a lot of information, and it’s a little bit difficult to decipher. The redPrimary, greenPrimary, and bluePrimary give you the direction of each of the primaries in XYZ space. Each one is reported as an (x, y) tuple, which is the result of the <a href="ps://en.wikipedia.org/wiki/CIE_1931_color_space#CIE_xy_chromaticity_diagram_and_the_CIE_xyY_color_space">calculation</a> X/(X+Y+Z) and Y/(X+Y+Z), respectively[6]. Notice that you’re only given two pieces of information; that means that this isn’t a 3D point in XYZ space, but it’s rather a line. The line can be given in parametric form:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">X(t) = x * t</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">Y(t) = y * t</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">Z(t) = (1-x-y) * t</span><br />
<br />
As you can see, this passes through the origin and extends outward in some direction forever. Therefore, these xy values give you direction, but not magnitude.<br />
<br />
To get magnitude, you need to consider the white point. The white point also has a direction, given in xy coordinates, which tells you which direction that farthest corner of the cube lies, but it doesn’t tell you how far along that line the corner is. To figure this out, you have to use the luminance figures reported by that API. Luminance is the Y channel of XYZ, so if you know the Y value and the direction of the line, you can solve for X and Z. Then, once you know that point, you can solve the maximum extents of the primaries by using the formula of redPrimary + greenPrimary + bluePrimary = whitePoint. That gives you the entire cube.<br />
<br />
Calculating the cube for iOS is less detailed. The Display P3 color space is supposed to match the colors representable on the monitor, so we can interrogate the color space instead of the monitor’s reported info. You can construct a CGColor using the <a href="https://developer.apple.com/documentation/coregraphics/cgcolorspace/1408916-displayp3">CGColorSpace.displayP3</a> and then use CGColor’s <a href="https://developer.apple.com/documentation/coregraphics/cgcolor/1455493-converted">conversion function</a> to turn it into an XYZ color. You can then scale the result by the luminosity of the display (which I looked up from laptopmag.com).<br />
<br />
Here's the full text of the Swift Playground I used to calculate the Windows information:<br />
<span style="font-family: Courier New, Courier, monospace;">import Foundation</span><br />
<span style="font-family: Courier New, Courier, monospace;">import CoreGraphics</span><br />
<span style="font-family: Courier New, Courier, monospace;">import GLKit</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">func calculateWhitePoint() -> (CGFloat, CGFloat, CGFloat) {</span><br />
<span style="font-family: Courier New, Courier, monospace;"> let xWhite = CGFloat(0.3125)</span><br />
<span style="font-family: Courier New, Courier, monospace;"> let yWhite = CGFloat(0.329101563)</span><br />
<span style="font-family: Courier New, Courier, monospace;"> let zWhite = 1 - xWhite - yWhite</span><br />
<span style="font-family: Courier New, Courier, monospace;"> let luminance = CGFloat(658.345215)</span><br />
<span style="font-family: Courier New, Courier, monospace;"> let normalizedLuminance = luminance / 374</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;"> let t = normalizedLuminance / yWhite</span><br />
<span style="font-family: Courier New, Courier, monospace;"> let XWhite = xWhite * t</span><br />
<span style="font-family: Courier New, Courier, monospace;"> let YWhite = yWhite * t</span><br />
<span style="font-family: Courier New, Courier, monospace;"> let ZWhite = (1 - xWhite - yWhite) * t</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;"> return (XWhite, YWhite, ZWhite)</span><br />
<span style="font-family: Courier New, Courier, monospace;">}</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">func convertXYZToRGB(X: CGFloat, Y: CGFloat, Z: CGFloat) -> (CGFloat, CGFloat, CGFloat) {</span><br />
<span style="font-family: Courier New, Courier, monospace;"> let r = 3.2406 * X - 1.5372 * Y - 0.4986 * Z</span><br />
<span style="font-family: Courier New, Courier, monospace;"> let g = -0.9689 * X + 1.8758 * Y + 0.0415 * Z</span><br />
<span style="font-family: Courier New, Courier, monospace;"> let b = 0.0557 * X - 0.2040 * Y + 1.0570 * Z</span><br />
<span style="font-family: Courier New, Courier, monospace;"> return (r, g, b)</span><br />
<span style="font-family: Courier New, Courier, monospace;">}</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">let (XWhite, YWhite, ZWhite) = calculateWhitePoint()</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">// X(t) = x * t</span><br />
<span style="font-family: Courier New, Courier, monospace;">// Y(t) = y * t</span><br />
<span style="font-family: Courier New, Courier, monospace;">// Z(t) = (1 - x - y) * t</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">let xRed = Float(0.674804688)</span><br />
<span style="font-family: Courier New, Courier, monospace;">let yRed = Float(0.316406250)</span><br />
<span style="font-family: Courier New, Courier, monospace;">let zRed = 1 - xRed - yRed</span><br />
<span style="font-family: Courier New, Courier, monospace;">let xGreen = Float(0.1953125)</span><br />
<span style="font-family: Courier New, Courier, monospace;">let yGreen = Float(0.708007813)</span><br />
<span style="font-family: Courier New, Courier, monospace;">let zGreen = 1 - xGreen - yGreen</span><br />
<span style="font-family: Courier New, Courier, monospace;">let xBlue = Float(0.151367188)</span><br />
<span style="font-family: Courier New, Courier, monospace;">let yBlue = Float(0.046875)</span><br />
<span style="font-family: Courier New, Courier, monospace;">let zBlue = 1 - xBlue - yBlue</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">// Red primary (XRed, YRed, ZRed): s * (xRed, yRed, zRed)</span><br />
<span style="font-family: Courier New, Courier, monospace;">// Green primary (XGreen, YGreen, ZGreen): t * (xGreen, yGreen, zGreen)</span><br />
<span style="font-family: Courier New, Courier, monospace;">// Blue primary (XBlue, YBlue, ZBlue): u * (xBlue, yBlue, zBlue)</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">// XWhite = XRed + XGreen + XBlue</span><br />
<span style="font-family: Courier New, Courier, monospace;">// YWhite = YRed + YGreen + YBlue</span><br />
<span style="font-family: Courier New, Courier, monospace;">// ZWhite = ZRed + ZGreen + ZBlue</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">// XWhite = s * xRed + t * xGreen + u * xBlue</span><br />
<span style="font-family: Courier New, Courier, monospace;">// YWhite = s * yRed + t * yGreen + u * yBlue</span><br />
<span style="font-family: Courier New, Courier, monospace;">// ZWhite = s * zRed + t * zGreen + u * zBlue</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">// [xRed, xGreen, xBlue] [s] [XWhite]</span><br />
<span style="font-family: Courier New, Courier, monospace;">// [yRed, yGreen, yBlue] * [t] = [YWhite]</span><br />
<span style="font-family: Courier New, Courier, monospace;">// [zRed, zGreen, zBlue] [u] [ZWhite]</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">let matrix = GLKMatrix3MakeAndTranspose(xRed, xGreen, xBlue, yRed, yGreen, yBlue, zRed, zGreen, zBlue)</span><br />
<span style="font-family: Courier New, Courier, monospace;">let inverted = GLKMatrix3Invert(matrix, nil)</span><br />
<span style="font-family: Courier New, Courier, monospace;">let solution = GLKMatrix3MultiplyVector3(inverted, GLKVector3Make(Float(XWhite), Float(YWhite), Float(ZWhite)))</span><br />
<span style="font-family: Courier New, Courier, monospace;">let s = solution.x</span><br />
<span style="font-family: Courier New, Courier, monospace;">let t = solution.y</span><br />
<span style="font-family: Courier New, Courier, monospace;">let u = solution.z</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">let XRed = s * xRed</span><br />
<span style="font-family: Courier New, Courier, monospace;">let YRed = s * yRed</span><br />
<span style="font-family: Courier New, Courier, monospace;">let ZRed = s * zRed</span><br />
<span style="font-family: Courier New, Courier, monospace;">let XGreen = t * xGreen</span><br />
<span style="font-family: Courier New, Courier, monospace;">let YGreen = t * yGreen</span><br />
<span style="font-family: Courier New, Courier, monospace;">let ZGreen = t * zGreen</span><br />
<span style="font-family: Courier New, Courier, monospace;">let XBlue = u * xBlue</span><br />
<span style="font-family: Courier New, Courier, monospace;">let YBlue = u * yBlue</span><br />
<span style="font-family: Courier New, Courier, monospace;">let ZBlue = u * zBlue</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">// Let's check our work</span><br />
<span style="font-family: Courier New, Courier, monospace;">XRed + XGreen + XBlue</span><br />
<span style="font-family: Courier New, Courier, monospace;">XWhite</span><br />
<span style="font-family: Courier New, Courier, monospace;">YRed + YGreen + YBlue</span><br />
<span style="font-family: Courier New, Courier, monospace;">YWhite</span><br />
<span style="font-family: Courier New, Courier, monospace;">ZRed + ZGreen + ZBlue</span><br />
<span style="font-family: Courier New, Courier, monospace;">ZWhite</span><br />
<span style="font-family: Courier New, Courier, monospace;">XRed / (XRed + YRed + ZRed)</span><br />
<span style="font-family: Courier New, Courier, monospace;">xRed</span><br />
<span style="font-family: Courier New, Courier, monospace;">YRed / (XRed + YRed + ZRed)</span><br />
<span style="font-family: Courier New, Courier, monospace;">yRed</span><br />
<span style="font-family: Courier New, Courier, monospace;">XGreen / (XGreen + YGreen + ZGreen)</span><br />
<span style="font-family: Courier New, Courier, monospace;">xGreen</span><br />
<span style="font-family: Courier New, Courier, monospace;">YGreen / (XGreen + YGreen + ZGreen)</span><br />
<span style="font-family: Courier New, Courier, monospace;">yGreen</span><br />
<span style="font-family: Courier New, Courier, monospace;">XBlue / (XBlue + YBlue + ZBlue)</span><br />
<span style="font-family: Courier New, Courier, monospace;">xBlue</span><br />
<span style="font-family: Courier New, Courier, monospace;">YBlue / (XBlue + YBlue + ZBlue)</span><br />
<span style="font-family: Courier New, Courier, monospace;">yBlue</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">// 0 0 0 -> 0 0 0</span><br />
<span style="font-family: Courier New, Courier, monospace;">// 1 0 0 -> XRed, YRed, ZRed</span><br />
<span style="font-family: Courier New, Courier, monospace;">// 0 1 0 -> XGreen, YGreen, ZGreen</span><br />
<span style="font-family: Courier New, Courier, monospace;">// 0 0 1 -> XBlue, YBlue, ZBlue</span><br />
<span style="font-family: Courier New, Courier, monospace;">// 1 1 0 -> XRed + XGreen, YRed + YGreen, ZRed + ZGreen</span><br />
<span style="font-family: Courier New, Courier, monospace;">// 0 1 1 -> XGreen + XBlue, YGreen + YBlue, ZGreen + ZBlue</span><br />
<span style="font-family: Courier New, Courier, monospace;">// 1 0 1 -> XRed + XBlue, YRed + YBlue, ZRed + ZBlue</span><br />
<span style="font-family: Courier New, Courier, monospace;">// 1 1 1 -> XRed + XGreen + XBlue, YRed + YGreen + YBlue, ZRed + ZGreen + ZBlue</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">let _000 = convertXYZToRGB(X: 0, Y: 0, Z: 0)</span><br />
<span style="font-family: Courier New, Courier, monospace;">let _100 = convertXYZToRGB(X: CGFloat(XRed), Y: CGFloat(YRed), Z: CGFloat(ZRed))</span><br />
<span style="font-family: Courier New, Courier, monospace;">let _010 = convertXYZToRGB(X: CGFloat(XGreen), Y: CGFloat(YGreen), Z: CGFloat(ZGreen))</span><br />
<span style="font-family: Courier New, Courier, monospace;">let _001 = convertXYZToRGB(X: CGFloat(XBlue), Y: CGFloat(YBlue), Z: CGFloat(ZBlue))</span><br />
<span style="font-family: Courier New, Courier, monospace;">let _110 = convertXYZToRGB(X: CGFloat(XRed + XGreen), Y: CGFloat(YRed + YGreen), Z: CGFloat(ZRed + ZGreen))</span><br />
<span style="font-family: Courier New, Courier, monospace;">let _011 = convertXYZToRGB(X: CGFloat(XGreen + XBlue), Y: CGFloat(YGreen + YBlue), Z: CGFloat(ZGreen + ZBlue))</span><br />
<span style="font-family: Courier New, Courier, monospace;">let _101 = convertXYZToRGB(X: CGFloat(XRed + XBlue), Y: CGFloat(YRed + YBlue), Z: CGFloat(ZRed + ZBlue))</span><br />
<span style="font-family: Courier New, Courier, monospace;">let _111 = convertXYZToRGB(X: CGFloat(XRed + XGreen + XBlue), Y: CGFloat(YRed + YGreen + YBlue), Z: CGFloat(ZRed + ZGreen + ZBlue))</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">/*</span><br />
<span style="font-family: Courier New, Courier, monospace;">0, 0, 0, 0, 1, 0, // left front</span><br />
<span style="font-family: Courier New, Courier, monospace;">0, 0, 0, 1, 0, 0, // bottom front</span><br />
<span style="font-family: Courier New, Courier, monospace;">1, 0, 0, 1, 1, 0, // right front</span><br />
<span style="font-family: Courier New, Courier, monospace;">0, 1, 0, 1, 1, 0, // top front</span><br />
<span style="font-family: Courier New, Courier, monospace;">*/</span><br />
<span style="font-family: Courier New, Courier, monospace;">print("\(_000.0), \(_000.1), \(_000.2), \(_010.0), \(_010.1), \(_010.2), // left front")</span><br />
<span style="font-family: Courier New, Courier, monospace;">print("\(_000.0), \(_000.1), \(_000.2), \(_100.0), \(_100.1), \(_100.2), // bottom front")</span><br />
<span style="font-family: Courier New, Courier, monospace;">print("\(_100.0), \(_100.1), \(_100.2), \(_110.0), \(_110.1), \(_110.2), // right front")</span><br />
<span style="font-family: Courier New, Courier, monospace;">print("\(_010.0), \(_010.1), \(_010.2), \(_110.0), \(_110.1), \(_110.2), // top front")</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">/*</span><br />
<span style="font-family: Courier New, Courier, monospace;">0, 0, 1, 0, 1, 1, // left back</span><br />
<span style="font-family: Courier New, Courier, monospace;">0, 0, 1, 1, 0, 1, // bottom back</span><br />
<span style="font-family: Courier New, Courier, monospace;">1, 0, 1, 1, 1, 1, // right back</span><br />
<span style="font-family: Courier New, Courier, monospace;">0, 1, 1, 1, 1, 1, // top back</span><br />
<span style="font-family: Courier New, Courier, monospace;">*/</span><br />
<span style="font-family: Courier New, Courier, monospace;">print("\(_001.0), \(_001.1), \(_001.2), \(_011.0), \(_011.1), \(_011.2), // left back")</span><br />
<span style="font-family: Courier New, Courier, monospace;">print("\(_001.0), \(_001.1), \(_001.2), \(_101.0), \(_101.1), \(_101.2), // bottom back")</span><br />
<span style="font-family: Courier New, Courier, monospace;">print("\(_101.0), \(_101.1), \(_101.2), \(_111.0), \(_111.1), \(_111.2), // right back")</span><br />
<span style="font-family: Courier New, Courier, monospace;">print("\(_011.0), \(_011.1), \(_011.2), \(_111.0), \(_111.1), \(_111.2), // top back")</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">/*</span><br />
<span style="font-family: Courier New, Courier, monospace;">0, 1, 0, 0, 1, 1, // top left</span><br />
<span style="font-family: Courier New, Courier, monospace;">0, 0, 0, 0, 0, 1, // bottom left</span><br />
<span style="font-family: Courier New, Courier, monospace;">1, 1, 0, 1, 1, 1, // top right</span><br />
<span style="font-family: Courier New, Courier, monospace;">1, 0, 0, 1, 0, 1 // bottom right</span><br />
<span style="font-family: Courier New, Courier, monospace;">*/</span><br />
<span style="font-family: Courier New, Courier, monospace;">print("\(_010.0), \(_010.1), \(_010.2), \(_011.0), \(_011.1), \(_011.2), // top left")</span><br />
<span style="font-family: Courier New, Courier, monospace;">print("\(_000.0), \(_000.1), \(_000.2), \(_001.0), \(_001.1), \(_001.2), // bottom left")</span><br />
<span style="font-family: Courier New, Courier, monospace;">print("\(_110.0), \(_110.1), \(_110.2), \(_111.0), \(_111.1), \(_111.2), // top right")</span><br />
<span style="font-family: Courier New, Courier, monospace;">print("\(_100.0), \(_100.1), \(_100.2), \(_101.0), \(_101.1), \(_101.2), // bottom right")</span><br />
<div>
<br /></div>
Litherumhttp://www.blogger.com/profile/12738405376090442005noreply@blogger.com1tag:blogger.com,1999:blog-8778351438463999796.post-63524158054531296992018-09-29T13:47:00.000-07:002018-09-29T13:47:16.453-07:00Texture SamplingTextures are one of the fundamental data types in 3D graphics. Any time you want to show an image on a 3D surface, you use a texture.<br />
<br />
<h3>
Texture Types</h3>
<br />
First of all, there are many kinds of textures. The simplest kind of texture to understand is a 2D texture, whose purpose is to act like a rectangular image. Each element in the image is configurable; you can specify that it’s a float, or an int, or 4 floats (for each channel of RGBA), etc. These “elements” are usually called “texels.” Similarly, there are 1D textures and 3D textures, which act similarly.<br />
<br />
Then, you’ve got 1D texture arrays, and 2D texture arrays, which are not simply arrays-of-textures. Instead, they are distinct types, where each element in the array is the relevant texture type. They are their own distinct resource types because GPUs can operate on them in hardware, so the array doesn’t have to be implemented in software. As such, the hardware restricts each element in the array to have the same dimensions. If you don’t like this requirement, you can create a software array of textures, and it will go slower but the requirement won’t apply. (Or you could even have an array of texture arrays!)<br />
<br />
<h3>
Mipmaps</h3>
<br />
There’s one other important piece to textures: mipmaps. Generally, textures are mapped onto arbitrary 3d geometry, which means that the number of pixels on-screen the texture is stretched over is totally arbitrary. Using regular projection matrices, the farther the geometry is from the viewer, the fewer pixels the texture is mapped onto.<br />
<br />
Consider drawing a single pixel of geometry that is far away from the camera. Here, the entire texture will be squished to fit into a small number of pixels, so that single pixel will be covered by many texels. So, if the renderer wanted to compute an accurate color for that pixel, it would have to average all the covered pixels together. However, what if that geometry moves closer to the camera, such that each pixel contains only ~1 texel? In this situation, no averaging is necessary; you can just do a read in the texture data.<br />
<br />
So, if the texture is big relative to the size on-screen it’s drawn, that’s a problem, but if it’s small, that’s no problem. Think about that for a second - big data sizes are a problem, but small data sizes okay. So what if the system could just reduce the big texture to a small texture as a preprocess? In fact, if there was a collection of reductions of various sizes, there would always be a size that is appropriate for the number of pixels being drawn.<br />
<br />
That’s exactly what a mipmap is. If a 2D texture has dimensions m * n, the object also has storage for an additional level of m/2 * n/2, and an additional level of m/4 * n/4, etc, down to a single texel. This doesn’t even waste that much memory, because it’s provable that x + x/2 + x/4 + x/8 … = 2*x, so the memory overhead is as much as an additional texture. This storage requirement also assumes that texture sizes are always powers-of-two, which is generally required, though nowadays many implementations have extensions that relax this requirement.<br />
<br />
So, naïvely, addressing a 2D texture requires 3 components: x, y, and which mipmap level. 3D textures require 4 components, and 1D textures require 2 components. 2D texture arrays require 4 components (there’s an extra one for the layer in the array) and 1D texture arrays require 3 components. With these components, the system only has to do a single read at runtime - no looping over texels required.<br />
<br />
<h3>
Automatic Miplevel Selection</h3>
<br />
The shader API, however, can calculate the mipmap level for you, so you don’t have to do that yourself in the shader (though you can if you want to). The key here is to figure out how many texels per pixel the texture is getting squished down to. If the answer is 2, you should use the second mipmap level. If the answer is 4, you should use the third mipmap level (since each level is half as large as the previous).<br />
<br />
So how does the system know how many texels cover your pixel? Well, if you think about it, this is the screen-space derivative of the sampling coordinate in the base level. Stated differently, it’s the rate of change of the texture coordinate (in texels) across the screen. So, how do you calculate this?<br />
<br />
If the code you’re writing is differentiable, you could just calculate it yourself in closed-form, and just use that. However, the system can approximate it automatically, using the fact that the GPU scheduler can schedule fragment shader threads however it likes. If the scheduler chooses to dispatch fragment shader threads in 2x2 blocks, then each thread in the block can share data among each other. Then, approximating this derivative is easy, it’s simply change-in-y / change-in-x = the difference of adjacent sampling coordinates divided by the difference of adjacent screen-space coordinates. Because we are sampling adjacent pixels, the difference of adjacent screen-space coordinates is just 1, so this derivative is calculated by just subtracting the sampling position of adjacent pixels. The pixels in the 2x2 block can share the result. (Of course this sharing only works if every fragment shader in the 2x2 block is at the same point in the shader so they can cooperate together.)<br />
<br />
So, the system does this subtraction of adjacent sampling coordinates to estimate the derivative, and takes the log base 2 of the derivative to select which miplevel to use. The result of this may not be exactly integral, so the sampler describes whether or not to just clamp to the nearest integer miplevel or to read both straddling miplevels and use a weighted average. You can also short-circuit this computation by explicitly specifying derivatives to use (which means the derivatives won’t be automatically calculated, but everything else will work the same way) or by just specifying which miplevel to use directly.<br />
<br />
<h3>
Dimension Reduction</h3>
<br />
But I’ve breezed over one of the details here - 2D textures have 2-dimensional texel coordinates, and screens also have 2-dimensional coordinates. How do we reduce these to a single miplevel? The Vulkan spec doesn’t actually describe exactly how to reduce the 2-dimensional texel coordinates into a single scalar, but it <a href="https://www.khronos.org/registry/vulkan/specs/1.1/html/vkspec.html#textures-scale-factor">does say</a> in section 15.6.7:<br />
<br />
ρ<sub>x</sub> and ρ<sub>y</sub> may be approximated with functions f<sub>x</sub> and f<sub>y</sub>, subject to the following constraints:<br />
f<sub>x</sub> is continuous and monotonically increasing in each of m<sub>ux</sub>, m<sub>vx</sub>, and m<sub>wx</sub><br />
f<sub>y</sub> is continuous and monotonically increasing in each of m<sub>uy</sub>, m<sub>vy</sub>, and m<sub>wy</sub><br />
max(|m<sub>ux</sub>|, |m<sub>vx</sub>|, |m<sub>wx</sub>|) <= f<sub>x</sub> <= sqrt(2) * (|m<sub>ux</sub>| + |m<sub>vx</sub>| + |m<sub>wx</sub>|)<br />
max(|m<sub>uy</sub>|, |m<sub>vy</sub>|, |m<sub>wy</sub>|) <= f<sub>y</sub> <= sqrt(2) * (|m<sub>uy</sub>| + |m<sub>vy</sub>| + |m<sub>wy</sub>|)<br />
<br />
So, you reduce the n-dimensional texture coordinate to a scalar by making up a formula that fits the above requirements. You apply the function twice - once for the horizontal screen derivative direction, and once for the vertical screen derivative direction.<br />
<br />
So this tells you (roughly) how many texels fit in the pixel vertically, and how many texels fit in the pixel horizontally. But these values don’t have to be the same. Imagine looking out in first-person across a rendered floor. There are many texels squished vertically, but not that many horizontally.<br />
<br />
This is called anisotropy. The amount of anisotropy is just the ratio of these two values. By default, texture sampling will just use the minimum of these two values when figuring out which miplevel to use. Remember - miplevels are zero-indexed, so the smaller the index, the more data is in that level, so the smaller miplevel means the highest level of detail. However, there some techniques in this area that involve doing extra work to improve the quality of the result.<br />
<br />
<h3>
Wrapping Things Up</h3>
<br />
At this point, the sampler provides shader authors some control over the miplevel selection. The sampler / optional arguments can include a “LOD Bias” which gets added to this value, so the author can get higher-or-lower detail as necessary. The sampler / optional arguments can also include a “LOD Clamp” which will be applied here, if, for example, not all the miplevels of the texture have their contents populated yet.<br />
<br />
So, now that you have a miplevel, you can do the rest of the operation. If the sampler says the sampling coordinate is normalized, you denormalize it by multiplying by the dimensions of the miplevel, and modulus / mirror / whatever the sampler tells you to do. Then, depending on the sampler settings, you either round the denormalized coordinates to the nearest integer, or you read all the straddling texels and perform a weighted average. Then, if the sampler tells you to, you do it all again at the next miplevel, and perform yet another weighted average.<br />
<br />
There’s one last tiny detail I’ve skipped over, and that is the fact that texel elements are considered to lie at the center of the texel. So, if you have a 1D texture with 2 texels, where one is black and one is white, 1/4 of the way through the texture will be full black, and 3/4 of the way through the texture will be full white, and 1/4 - 3/4 will be a gradient from black to white. But what is drawn from 0 - 1/4 and from 3/4 - 1? What about values less than 0 or greater than 1? The sampler allows for configuring this. The modulus / mirroring operation results in a value that is either on the interior of the texture, or 1 texel around the edge. These texels around the edge either get values from being repeated / mirrored / whatever, or they can just be set to a constant “border color.” This color is fed as input to the weighted average calculation, so everything just works correctly.Litherumhttp://www.blogger.com/profile/12738405376090442005noreply@blogger.com2tag:blogger.com,1999:blog-8778351438463999796.post-3779485948674315202018-09-09T19:09:00.003-07:002018-09-10T21:47:53.254-07:00Comparison of Entity Framework to Core DataObject Relational Mapping libraries connect in-memory object graphs to relational databases. Object-oriented programming is built upon the idea that there is an in-memory object graph, where each object is an instance of a class. An ORM is the software that can save that object graph to a database, either on-disk or using a service across the network.<br />
<br />
<a href="https://docs.microsoft.com/en-us/ef/index">Entity Framework</a> is Microsoft’s premier ORM library, and <a href="https://developer.apple.com/documentation/coredata">Core Data</a> is Apple’s premier ORM library. Both have the same goals - to persist an object graph to a database - but they were developed by different companies for different languages. It stands to reason that they made some different design choices.<br />
<br />
<h2>
Which Entity Framework?</h2>
<br />
Microsoft is infamous for creating multiple ways to do the same thing, and ORM libraries are no different. There are two versions of Entity Framework: Entity Framework 6, and Entity Framework Core. The <a href="https://docs.microsoft.com/en-us/ef/efcore-and-ef6/choosing">documentation</a> says that Entity Framework Core is the new hotness. Also, Entity Framework Core is <a href="https://www.blogger.com/open%20source">open source</a>.<br />
<br />
So let’s start using Entity Framework Core, right? Well, not so fast. It turns out that you have to pick a runtime that Entity Framework Core will run on top of.<br />
<br />
<h2>
Which Runtime?</h2>
<br />
Entity Framework was originally developed for .NET. So that’s fine, but it turns out there are multiple versions of .NET.<br />
<ul>
<li><a href="https://en.wikipedia.org/wiki/.NET_Framework">.NET Framework</a> only runs on Windows</li>
<li><a href="https://docs.microsoft.com/en-us/dotnet/core/index">.NET Core</a> is written by Microsoft, and runs on Windows, Linux, and macOS. The documentation says that .NET Core is better than .NET Framework. Also, .NET Core is <a href="https://github.com/dotnet/core">open source</a>.</li>
<li><a href="https://docs.microsoft.com/en-us/dotnet/standard/net-standard">.NET Standard</a> is just a standard. It isn’t a piece of software - it’s a specification that describes a level of support that a runtime needs to have in order to be compliant. <a href="https://visualstudio.microsoft.com/xamarin/">Xamarin</a> is another .NET runtime that supports the .NET Standard (and it runs on iOS / Android). Targeting this runtime means your app will work in every .NET implementation, but it won’t have access to some of the libraries only present in .NET Core.</li>
<li>The <a href="https://en.wikipedia.org/wiki/Universal_Windows_Platform">Universal Windows Platform</a> is a runtime compliant with the .NET Standard. The Entity Framework documentation says that UWP is now supported. One interesting note: as part of the compilation process, the platform-independent .NET bytecode is run through the <a href="https://docs.microsoft.com/en-us/dotnet/framework/net-native/index">.NET Native</a> toolchain, which produces a platform-dependent binary. They say this is to improve performance. (So I guess this means that the Universal Windows Platform isn’t really universal?) This compilation is somewhat lossy because reflection doesn’t fully work in native apps, and it sounds like Entity Framework had some bugs here that they had to fix.</li>
</ul>
There’s an <a href="https://docs.microsoft.com/en-us/ef/core/get-started/uwp/getting-started">example</a> in the Entity Framework Core documentation about how to use it with the Universal Windows Platform, and UWP is the new hotness, so I’ll use that. If you dig into the example, you’ll find that the Entity Framework tools don’t work with UWP projects, so they had to make a dummy .NET Core project with nothing inside it, just to run the tools. How unfortunate.<br />
<br />
<h2>
Getting Entity Framework</h2>
<br />
Entity Framework is not built in to the system. Instead, you’ll have to get it from Visual Studio’s blessed package manager, named <a href="https://www.nuget.org/">NuGet</a>. When you install packages with NuGet, they’re not installed across the whole system; instead, they’re installed only for a single project. NuGet is built in to Visual Studio - simply go to Project -> Manage NuGet Packages to search/install packages.<br />
<br />
Entity Framework is designed to be pluggable to different kinds of databases, and each database has its own package inside NuGet. The example uses a SQLite database, so it uses the Microsoft.EntityFrameworkCore.Sqlite package. There is also another package, Microsoft.EntityFrameworkCore.Tools, which includes command-line tools to generate migration code / apply migrations, so that one is included too.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEglOdjbIenBJtQVliiSYDPKnmYzDbAH5Rz_XzcxP7yF7wviUkDy_jdOoSTWQhQqe20sbjZHVd25_LwNx6CATKWTEcQ3odvNE8FpbiiXtQUZKeOjRTXsSC0q5II41hG9qveTWoo05uKf8USe/s1600/NuGet.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1034" data-original-width="1600" height="206" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEglOdjbIenBJtQVliiSYDPKnmYzDbAH5Rz_XzcxP7yF7wviUkDy_jdOoSTWQhQqe20sbjZHVd25_LwNx6CATKWTEcQ3odvNE8FpbiiXtQUZKeOjRTXsSC0q5II41hG9qveTWoo05uKf8USe/s320/NuGet.PNG" width="320" /></a></div>
<br />
<br />
<h2>
How to get Core Data</h2>
<br />
It’s already part of the platform, and there’s only one version. Just use it.<br />
<br />
<h2>
High Level</h2>
<br />
Both libraries have a concept of a “context” which is the thing that holds the link to all the objects in the object graph. For Entity Framework, this is the <a href="https://docs.microsoft.com/en-us/dotnet/api/microsoft.entityframeworkcore.dbcontext?view=efcore-2.1">Microsoft.EntityFrameworkCore.DbContext</a>, and for Core Data, this is the <a href="https://developer.apple.com/documentation/coredata/nsmanagedobjectcontext">NSManagedObjectContext</a>. When you create an object, you register it with the context, and when you delete an object, you notify the context that it has been deleted. After you’ve done all your modifications, you tell the context to “save,” which stores all the changes in the database.<br />
<br />
Entity Framework:<br />
<code>var blog = new Blog { url = url };<br />
db.Blogs.Add(blog);<br />
db.SaveChanges();</code><br />
<br />
Core Data:<br />
<code>let blog = Blog(context: context, url: url)<br />
try context.save()</code><br />
<br />
Read/Modify/Write operations are also quite similar:<br />
<br />
Entity Framework:<br />
<code>var blog = db.Blogs.First();<br />
blog.Url = url;<br />
db.SaveChanges();</code><br />
<br />
Core Data:<br />
<code>let fetchRequest = Blog.fetchRequest() as NSFetchRequest<blog></blog><br />
fetchRequest.fetchLimit = 1<br />
let blog = try context.fetch(fetchRequest)[0]<br />
blog.url = url<br />
try context.save()</code><br />
<br />
<br />
<h2>
Context</h2>
<br />
In Core Data, the NSManagedObjectContext is just a class. When modifications are made to the object graph, the NSManagedObjectContext makes a strong reference to the object (because Swift is reference-counted, the distinction between strong and weak references are important). When it gets saved, the NSManagedObjectContext knows what to save.<br />
<br />
However, in Entity Framework, the DbContext is magical. The application needs to subclass DbContext, and the subclass needs to have <a href="https://docs.microsoft.com/en-us/dotnet/api/microsoft.entityframeworkcore.dbset-1?view=efcore-2.1">DbSet</a> properties. These DbSets refer to the various tables in the database. When the DbContext’s constructor is run, it <a href="https://github.com/aspnet/EntityFrameworkCore/blob/master/src/EFCore/Internal/DbSetFinder.cs">uses reflection</a> to inspect itself, find all the DbSet properties, and inspect the generic type argument to determine the data model. It builds up a <a href="https://docs.microsoft.com/en-us/dotnet/api/microsoft.entityframeworkcore.modelbuilder?view=efcore-2.1">Microsoft.EntityFrameworkCore.ModelBuilder</a>, and lets you make any last-minute changes you want inside <a href="https://docs.microsoft.com/en-us/dotnet/api/microsoft.entityframeworkcore.dbcontext.onmodelcreating?view=efcore-2.1#Microsoft_EntityFrameworkCore_DbContext_OnModelCreating_Microsoft_EntityFrameworkCore_ModelBuilder_">DbContext.OnModelCreating()</a>.<br />
<br />
<h2>
Objects</h2>
<br />
In Core Data, each object in the object graph is represented by <a href="https://developer.apple.com/documentation/coredata/nsmanagedobject">NSManagedObject</a>. This object acts like a dictionary; you can “set properties” by using the <a href="https://developer.apple.com/documentation/foundation/object_runtime/nskeyvaluecoding">Key-Value Coding</a> functions value(forKey:) and setValue(_, forKey:). You can get better type-safety if you subclass NSManagedObject for each of your entities, and add typed properties. However, if you do this, you have to make sure that getting/setting these properties calls the Key-Value Coding methods on the inner NSManagedObject. Swift has a helpful keyword, @NSManaged, which does this for you. Even further, Xcode will even generate the subclass for you at compilation time, with the appropriately typed @NSManaged properties, if you select the appropriate value for “Codegen” in the right sidebar, with the entity selected. (Or you can use the managedObjectClassName string property on NSEntityDescription when building the NSManagedObjectModel, and Core Data will construct this class at runtime using the <a href="https://developer.apple.com/documentation/objectivec">Objective-C runtime</a>).<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiMDKtgFzBn1AAcMhlxaJXBKNTAt3GOjuWXiL7nUimqrPpXlORZ18AaBq66jrp9zLoM5X42yV12mZZcAhfTitZIkNOmVZeCmUIPe79ca41fWZKOzwRpF5CD7w1tYHsUCMSOrJe0QLIPw8rL/s1600/Screen+Shot+2018-09-09+at+3.27.25+PM.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="511" data-original-width="260" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiMDKtgFzBn1AAcMhlxaJXBKNTAt3GOjuWXiL7nUimqrPpXlORZ18AaBq66jrp9zLoM5X42yV12mZZcAhfTitZIkNOmVZeCmUIPe79ca41fWZKOzwRpF5CD7w1tYHsUCMSOrJe0QLIPw8rL/s320/Screen+Shot+2018-09-09+at+3.27.25+PM.png" width="162" /></a></div>
<br />
<br />
NSManagedObjects know which context they belong to, and their context requires you to pass in the context. This is presumably so when values get modified, the NSManagedObject can notify the NSManagedObjectContext.<br />
<br />
In Entity Framework, each object in the object graph is just a regular object. No subclassing required, and no manifest or custom model creation code either. The DbContext learns about the object’s shape from reflection. This means that the ChangeTracker in the DbContext doesn’t automatically know about changes; instead, it has to DetectChanges() which <a href="https://github.com/aspnet/EntityFrameworkCore/blob/release/2.2/src/EFCore/ChangeTracking/Internal/ChangeDetector.cs">iterates</a> through the known objects. This is done automatically when it’s required.<br />
<br />
<h2>
Connection Between Classes and Data</h2>
<br />
In Core Data, when the system wants to populate a property of an object, it can do it dynamically, because the getter of the property will be filtered through value(forKey:). This way, the setter doesn’t have to know what the name of the field is at compilation time, which is required when the data model is created at runtime.<br />
<br />
However, in Entity Framework, objects are just regular classes. This is a problem, though; how can Entity Framework set the correct property on the class when the name of the property is only known at runtime (because the model can be modified at runtime)? Well, it turns out it uses <a href="https://docs.microsoft.com/en-us/dotnet/standard/using-linq">Linq</a> to <a href="https://github.com/aspnet/EntityFrameworkCore/blob/master/src/EFCore.Relational/Query/ExpressionVisitors/Internal/MaterializerFactory.cs">build</a> a program at runtime that <a href="https://docs.microsoft.com/en-us/dotnet/api/system.linq.expressions.expression.makememberaccess?view=netcore-2.1">can set</a> properties that are only known at runtime. This is extremely powerful; it looks like you can use Linq to write almost anything that you could write in C#.<br />
<br />
<h2>
Data Model</h2>
<br />
In Entity Framework, the DbContext constructor uses reflection to discover the object graph. You get a chance to modify the model at runtime in DbContext.OnModelCreating(), which is called inside the DbContext’s constructor. However, adding an entity to the model requires a class to match that entity. However, for properties, you <a href="https://docs.microsoft.com/en-us/ef/core/modeling/shadow-properties">can</a> have a property that is present in the model but isn’t present in the class. This is valuable for things like automatically saved date fields.<br />
<br />
In Core Data, there is a separate data file that describes the model declaratively (with the file extension .xcdatamodeld). You can edit these with a GUI inside Xcode. This file corresponds to a <a href="https://developer.apple.com/documentation/coredata/nsmanagedobjectmodel">NSManagedObjectModel</a>, which you can build at runtime instead, if you want. Then, when you bring up the Core Data stack, you can specify this model.<br />
<br />
<h2>
Fetch Queries</h2>
<br />
In EntityFramework, the DbSet implements the <a href="https://docs.microsoft.com/en-us/dotnet/api/system.linq.iqueryable?view=netcore-2.1">IQueryable</a> interface. This is an interface that represents a Query node inside the Linq framework. Functions like .where() and .OrderBy() operate on these nodes and return other nodes, letting you chain up these operators. These operators aren’t actually applied at the time you call the function; instead they are a sort of retained-mode program. Whenever you want to actually pull data out of the query at the end, the runtime will look at the chain of operators and figure out how best to apply it (usually by creating SQL that matches the operation). However, some of the operations need to be applied by the client; this transparently works, but it obviously isn’t great for performance.<br />
<br />
Core Data uses the same sort of thing, encapsulated by NSPredicate and NSExpression. NSExpression is the same kind of node inside a retained-mode program. These are quite powerful; you <a href="https://developer.apple.com/documentation/foundation/nsexpression/1412905-init">can</a> even call arbitrary selectors on arbitrary objects. The big difference between this and Linq is that, in true Objective-C style, NSExpression isn’t typed, but Linq is typed.<br />
<br />
<h2>
Parallelism</h2>
<br />
Both Entity Framework and Core Data’s contexts are single-threaded, which means the managed objects all have to live on the same thread as their context. However, fetches and stores involve round trips to databases, which can be quite slow and would block the main thread. Entity Framework gets around this by providing Async versions of the fetching / saving functions. In this model, the objects live on the main thread, but the UI can still be redraw during the slow database operations.<br />
<br />
Core Data has two approaches to this. One way is to host the entire Core Data object graph in another thread. You get this if the NSManagedObjectContext is initialized with the <a href="https://developer.apple.com/documentation/coredata/nsmanagedobjectcontext/1506709-init">concurrencyType</a> argument set to <a href="https://developer.apple.com/documentation/coredata/nsmanagedobjectcontextconcurrencytype/privatequeueconcurrencytype">.privateQueueConcurrencyType</a>. If you do this, the NSManagedObjectContext will create its own private queue, and operations on the NSManagedObjectContext are only valid from that queue. You run code on that queue by using NSManagedObjectContext’s <a href="https://developer.apple.com/documentation/coredata/nsmanagedobjectcontext/1506578-perform">perform(_:)</a> function. Inside the callback, you can execute your fetch requests, build up some data, and post a message back to the main queue with your data (but not with NSManagedObjects!).<br />
<br />
Alternatively, you can use the main queue, and use <a href="https://developer.apple.com/documentation/coredata/nsasynchronousfetchrequest">NSAsynchronousFetchRequest</a> to create objects asynchronously. As far as I can tell, there is no equivalent call for NSManagedObjectContext.save(), and from my sampling, it appears that NSManagedObjectContext.save() is synchronous (though perhaps it doesn't have to be?)<br />
<br />
Entity Framework:<br />
<code>var blog = await db.Blogs.FirstAsync();<br />
blog.Url = url;<br />
await db.SaveChangesAsync();</code><br />
<br />
Core Data:<br />
<code>let fetchRequest = Blog.fetchRequest() as NSFetchRequest<br />
fetchRequest.fetchLimit = 1<br />
let asynchronousFetch = NSAsynchronousFetchRequest(fetchRequest: fetchRequest) { (result) in<br />
let blog = result.finalResult![0]<br />
blog.url = url<br />
do {<br />
try context.save()<br />
} catch {<br />
…<br />
}<br />
}<br />
try context.execute(asynchronousFetch)</code><br />
<br />
Edit: The <a href="https://developer.apple.com/videos/play/wwdc2012/214/">Core Data Best Practices</a> video from 2012 describes how you can achieve asynchronous saves by using a parent/child NSManagedObjectContext pair. You set the child to live on one thread and the parent to live on the other thread, and when you tell the child to save, it will just push its changes to the other context on the other thread. Then you can asynchronously tell the other thread to save by using perform(_:).<br />
<br />
<h2>
Migrations</h2>
<br />
In Entity Framework, a migration is modeled as a chunk of code. However, this code is written by one of the tools inside Microsoft.EntityFrameworkCore.Tools. The command line tool saves a snapshot of whatever the current database schema is, and can create a new schema by using the same mechanism that DbContext uses when it creates a schema at runtime. Then, after you’ve created a migration, you can apply it, which involves running the code on your local development machine to upgrade the database to the new version. These tools even have <a href="https://docs.microsoft.com/en-us/ef/core/miscellaneous/cli/powershell">documentation</a>. You have a chance to fine-tune the migration by editing the source code the tool created, because creating the migration code and applying it to the database are two distinct steps. Because the migration is generated code, you can run it in your app instead of on your local development machine.<br />
<br />
But wait, not so fast! The command-line tools use reflection on your source code to generate a model? Yep. That means the command-line tools build your source code. Then they look in your source code for the new model. If the command-line tools are supposed to perform the migration, then it’s supposed to connect to the database, too. But wait, how does it connect to the database? Well, your source code connects to the database … and the command-line tools will just run that code. The <a href="https://docs.microsoft.com/en-us/ef/core/miscellaneous/cli/dbcontext-creation">documentation</a> describes what functions / classes it will look for in your code and run on your local machine.<br />
<br />
Core Data handles migrations totally differently. Some simple migrations can happen automatically, right when you open the database (and you can check whether your change is “simple” by using a class function on <a href="https://developer.apple.com/documentation/coredata/nsmappingmodel">NSMappingModel</a>.) But, more complicated migrations are described declaratively in a .xcmappingmodel file, which Xcode lets you edit with a GUI. The expressions are described by strings, which (presumably) are the same strings that NSExpression accepts. This file corresponds to a NSMappingModel, which you can construct at runtime instead of loading from a bundle. Then, when you want to run the migration, you can use <a href="https://developer.apple.com/documentation/coredata/nsmigrationmanager">NSMigrationManager</a> and pass in the NSMappingModel you want it to use. (One gotcha: to create a .xcmappingmodel in Xcode, it has to be between two different versions of the same model. You can create a new version of a model by selecting Editor -> Add Model Version.)<br />
<br />
<h2>
Configuring the Database</h2>
<br />
The constructor to Microsoft.EntityFrameworkCore.DbContext requires a <a href="https://docs.microsoft.com/en-us/dotnet/api/microsoft.entityframeworkcore.dbcontextoptions?view=efcore-2.1">Microsoft.EntityFrameworkCore.DbContextOptions</a>, which is built by a Microsoft.EntityFrameworkCore.DbContextOptionsBuilder. C# has this nifty feature where you can declare a free function, but give the first argument the “this” keyword, and that free function will appear as if it was inside the class definition. So, the individual database package <a href="https://docs.microsoft.com/en-us/dotnet/api/microsoft.entityframeworkcore.sqlitedbcontextoptionsbuilderextensions.usesqlite?view=efcore-2.1">adds a function</a> to the DbContextOptionsBuilder. (I haven’t investigated what the package does inside this function.) Then, the client code calls optionsBuilder.UseSqlite(connectionString), for example. You can use Microsoft.Data.Sqlite.SqliteConnectionStringBuilder() to build the connection string. You do this inside the <a href="https://docs.microsoft.com/en-us/dotnet/api/microsoft.entityframeworkcore.dbcontext.onconfiguring?view=efcore-2.1">DbContext.OnConfiguring()</a> function so the command-line tools know how to configure the database.<br />
<br />
Core Data works differently. Each persistent store is described via a NSPersistentStoreDescription, which includes a string “type” property. This “type” refers to the registeredStoreTypes registry inside NSPersistentStoreCoordinator, which can be extended with additional subclasses of NSPersistentStore. There are also 4 <a href="https://developer.apple.com/documentation/coredata/nspersistentstorecoordinator/persistent_store_types">built-in strings</a> for well-known database types.Litherumhttp://www.blogger.com/profile/12738405376090442005noreply@blogger.com2tag:blogger.com,1999:blog-8778351438463999796.post-68980988613794930722018-08-25T15:00:00.000-07:002018-08-25T15:05:06.010-07:00Development Analogy<style>
td,th {
border: 1px solid white;
}
</style>
<table>
<tbody>
<tr><th>iOS</th><th>Windows</th></tr>
<tr><td>Swift</td><td>C#</td></tr>
<tr><td>Metal</td><td>Direct3D 12</td></tr>
<tr><td>Core Animation</td><td>DirectComposition</td></tr>
<tr><td>Core Graphics</td><td>Direct2D</td></tr>
<tr><td>WebKit</td><td>EdgeHTML</td></tr>
<tr><td>WKWebView</td><td>Windows.UI.Xaml.Controls.WebView</td></tr>
<tr><td>JavaScriptCore</td><td>Chakra</td></tr>
<tr><td>Core Text</td><td>DirectWrite</td></tr>
<tr><td>Core Data</td><td>EntityFramework</td></tr>
<tr><td>XMLParser</td><td>Windows.Data.Xml.Dom</td></tr>
<tr><td>JSONSerialization</td><td>Windows.Data.Json</td></tr>
<tr><td>.xib</td><td>.xaml</td></tr>
<tr><td>.dylib</td><td>.dll</td></tr>
<tr><td>dlopen/dlsym</td><td>LoadLibrary/GetProcAddress</td></tr>
<tr><td>UISplitViewController</td><td>Windows.UI.Xaml.Controls.SplitView</td></tr>
<tr><td>UITextField</td><td>Windows.UI.Xaml.Controls.TextBox</td></tr>
<tr><td>URLSession</td><td>Windows.Web.Http.HttpClient</td></tr>
<tr><td>Bundle</td><td>Windows.ApplicationModel.Package</td></tr>
<tr><td>Xcode</td><td>Visual Studio</td></tr>
</tbody></table>Litherumhttp://www.blogger.com/profile/12738405376090442005noreply@blogger.com1tag:blogger.com,1999:blog-8778351438463999796.post-19454742157333275692017-07-31T14:27:00.000-07:002017-08-02T01:37:14.419-07:00Wide and Deep Color in Metal and OpenGL<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal;">
<span style="-webkit-text-stroke-width: initial; font-size: 11pt;">“Wide Color” and “Deep Color” refer to different things. A color space can be “wide” if it has a gamut that is bigger than sRGB. “Gamut” roughly corresponds to how saturated it is possible to represent a color. The wider the color space is, it is possible to represent more and more saturated colors.</span></div>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal; min-height: 13.1px;">
<span style="-webkit-font-kerning: none; font-size: 11pt;"></span><br /></div>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal;">
<span style="-webkit-font-kerning: none; font-size: 11pt;">“Deep color” refers to the number of representable values in a particular encoding of a color space. An encoding of a color space is “deep” if it has more than 2^24 representable values.</span></div>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal; min-height: 13.1px;">
<span style="-webkit-font-kerning: none; font-size: 11pt;"></span><br /></div>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal;">
<span style="-webkit-font-kerning: none; font-size: 11pt;">Consider widening a color space without making it deeper. In this situation, you have the same number of representable colors, but these individual points are being stretched farther apart. Therefore, the density of representable colors decreases. This is a problem because it means that our eyes might be able to distinguish between adjacent colors with a higher granularity than the granularity at which they are represented. This commonly leads to “banding,” where what should be a smooth gradient of color over an area appears to our eyes as having stripes of individual colors.</span></div>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal; min-height: 13.1px;">
<span style="-webkit-font-kerning: none; font-size: 11pt;"></span><br /></div>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal;">
<span style="-webkit-font-kerning: none; font-size: 11pt;">Consider deepening a color space without making it wider. In this situation, you are squeezing more and more points within the same volume of colors, making the density of these points increase. Now, adjacent points may be so close that our eye may not be able to distinguish them. This results in image quality that isn’t any better, but the amount of information required to store the information is higher, resulting in wasted space.</span></div>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal; min-height: 13.1px;">
<span style="-webkit-font-kerning: none; font-size: 11pt;"></span><br /></div>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal;">
<span style="-webkit-font-kerning: none; font-size: 11pt;">The trick is to do both at once. Widening the gamut, and increasing the number of representable values within that gamut, keeps the density of points roughly equivalent. More information is required to store the image, and the image looks more vibrant to our eyes.</span></div>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal; min-height: 13.1px;">
<span style="-webkit-font-kerning: none; font-size: 11pt;"></span><br /></div>
<h2>
<span style="-webkit-font-kerning: none; font-size: 11pt;">OpenGL</span></h2>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal; min-height: 13.1px;">
<span style="-webkit-font-kerning: none; font-size: 11pt;"></span><br /></div>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal;">
<span style="-webkit-font-kerning: none; font-size: 11pt;">Originally, OpenGL itself didn’t specify what color space its result pixels are in. At the time it was created, this meant that by default, the results were interpreted as sRGB. However, sRGB is a non-linear color space, which means that math on pixel values is meaningless. Unfortunately, alpha blending is math on pixel values, which meant that, by default, blend operations (and all math done in pixel shaders, unless this math was explicitly fixed by the shader author) was broken.</span></div>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal; min-height: 13.1px;">
<span style="-webkit-font-kerning: none; font-size: 11pt;"></span><br /></div>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal;">
<span style="-webkit-font-kerning: none; font-size: 11pt;">One solution is to simply make the operating system interpret the pixel results as in “linear sRGB.” Indeed, macOS lets you do this by setting the <a href="https://developer.apple.com/documentation/quartzcore/caopengllayer/1521873-colorspace">colorSpace property</a> of an NSWindow or CGLayer. Unfortunately, this doesn’t give good results because these pixel results are in 24-bit color, and all of these representable colors should be (roughly) perceptually equidistant from each other. Our eyes, though, are better at perceiving color differences in low-light, which means that dark colors need a higher density of representable values than bright colors. So, in “linear sRGB,” the density of representable values is constant, so we actually don’t have enough definition for dark colors to look good. Increasing the density of representable values would solve the problem for dark colors, but it would make bright colors waste information. (This extra information would probably cost GPU bandwidth, which would probably be fine for just displaying the image on a monitor, but not all GPUs support rendering to > 24-bit color…)</span></div>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal; min-height: 13.1px;">
<span style="-webkit-font-kerning: none; font-size: 11pt;"></span><br /></div>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal;">
<span style="-webkit-font-kerning: none; font-size: 11pt;">So the colors in the framebuffer need to be in regular sRGB, not linear sRGB. But this means that blending math is meaningless! OpenGL solved this by creating an <a href="https://www.khronos.org/registry/OpenGL/extensions/EXT/EXT_texture_sRGB.txt">extension</a>, EXT_TEXTURE_SRGB (which later got promoted to be part of OpenGL Core), which says “whenever you want to perform blending, read the contents of the sRGB destination color from the framebuffer, convert it to a float, linearize it, perform the blend, delinearize it, convert it back to 24-bit color, and store it to the framebuffer”. This way, the final results are always in sRGB, but the blending is done in linear space. This ugly processing only happens on the framebuffer color, not on the output of the fragment shader, so your fragment shader can assume that everything is in linear space, so any math performed will be meaningful.</span></div>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal; min-height: 13.1px;">
<span style="-webkit-font-kerning: none; font-size: 11pt;"></span><br /></div>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal;">
<span style="-webkit-font-kerning: none; font-size: 11pt;">The trigger to perform this processing is a special format for the framebuffer (so it’s an opt-in feature). Now, in OpenGL, the default framebuffer is not created by OpenGL. Instead, it is created by the Operating System and handed as-is to OpenGL. This means that you have to tell the OS, not OpenGL, to create a framebuffer with one of these special formats. On iOS, you do this by setting the drawableColorFormat of the GLKView. Note that opting in to sRGB is not orthogonal to using other formats - only certain formats are compatible with the sRGB processing.</span></div>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal; min-height: 13.1px;">
<span style="-webkit-font-kerning: none; font-size: 11pt;"></span><br /></div>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal;">
<span style="-webkit-font-kerning: none; font-size: 11pt;">On iOS, as far as I can tell, OpenGL does not support wide or deep color (because you can’t tell the OS how to interpret the pixel results of OpenGL like you can on macOS - all OpenGL pixels are assumed to be in sRGB). <a href="https://developer.apple.com/documentation/quartzcore/caeagllayer">CAEAGLLayer</a> doesn't have a "colorSpace" property. <a href="https://developer.apple.com/documentation/glkit/glkviewdrawablecolorformat">I can’t find any extended-range formats.</a></span></div>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal; min-height: 13.1px;">
<span style="-webkit-font-kerning: none; font-size: 11pt;"></span><br /></div>
<h2>
<span style="-webkit-font-kerning: none; font-size: 11pt;">Metal</span></h2>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal; min-height: 13.1px;">
<span style="-webkit-font-kerning: none; font-size: 11pt;"></span><br /></div>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal;">
<span style="-webkit-font-kerning: none; font-size: 11pt;">On iOS, Metal supports the same type of sRGB / non-sRGB formats that OpenGL does. You can set the MTKView’s <a href="https://developer.apple.com/documentation/metalkit/mtkview/1535940-colorpixelformat">colorPixelFormat</a> to one of the sRGB formats, which has the same effect as is it does in OpenGL. Setting it to a non-sRGB format means that blending is performed as-is, which is broken; however, the sRGB formats perform the correct linearization / delinearization for sRGB.</span></div>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal; min-height: 13.1px;">
<span style="-webkit-font-kerning: none; font-size: 11pt;"></span><br /></div>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal;">
<span style="-webkit-font-kerning: none; font-size: 11pt;">iOS doesn’t support the same sort of color space annotation that macOS does. In particular, a UIWindow or a CALayer doesn’t have a “colorspace” property. Because of this, all colors are expected to be in sRGB. For non-deep and non-wide color, using the regular sRGB pixel formats is sufficient, and these will clamp to the sRGB gamut (meaning clamped between 0 and 1).</span></div>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal; min-height: 13.1px;">
<span style="-webkit-font-kerning: none; font-size: 11pt;"></span><br /></div>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal;">
<span style="-webkit-font-kerning: none; font-size: 11pt;">And then wide color came along. As noted earlier, wide color and deep color need to happen together, so they aren’t controllable independently. However, there is a conundrum: Because the programmer can’t annotate a particular layer with what color space the values should be interpreted as, how do you represent colors outside of sRGB? The solution is for the colorspace to be extended to beyond the 0 - 1 range. This way, colors within 0 - 1 are interpreted as sRGB as they always have. However, colors outside that range represent the new wider colors. It’s important to note that, because because the new gamut completely includes sRGB, that values must be able to be negative as well as greater than 1. A completely saturated red in the display’s native color space (which is similar to P3) has negative components for green and blue.</span></div>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal; min-height: 13.1px;">
<span style="-webkit-font-kerning: none; font-size: 11pt;"></span><br /></div>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal;">
<span style="-webkit-font-kerning: none; font-size: 11pt;">The mechanism for enabling this is similar to OpenGL: you select a new special pixel format. The new pixel formats have “_XR” in their name, for “extended range.” These formats aren’t clamped to 0 - 1. sRGB also applies here; the new extended range pixel formats have sRGB variants, which perform a similar gamma function as they did before in OpenGL. This gamma function is <a href="https://developer.apple.com/documentation/coregraphics/kcgcolorspaceextendedsrgb?language=objc">extended</a> (in the natural way) to values greater than 1. For values less than 0, this gamma curve is flipped around to curve pointing downward (this makes it an <a href="https://en.m.wikipedia.org/wiki/Even_and_odd_functions">“odd” function</a>).</span></div>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal;">
<span style="-webkit-font-kerning: none; font-size: 11pt;"><br /></span>
<span style="-webkit-font-kerning: none; font-size: 11pt;">Using these new pixel formats causes your colors to go from 8 bits per channel to 10 bits per channel. The new 10 bits per channel colors are now signed (because they can go < 0), which means that there are 4 times as many representable values, and half of them are below 0, so the number of positive representable values doubled. In a non-sRGB variant, the maximum value is just around 2, but in an sRGB variant, the maximum value is greater than 2 because of the gamma curve.</span></div>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal; min-height: 13.1px;">
<span style="-webkit-font-kerning: none; font-size: 11pt;"></span><br /></div>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal;">
<span style="-webkit-font-kerning: none; font-size: 11pt;">On macOS, there is a way to explicitly tell the system how to interpret the color values in a NSWindow or a CALayer using the <a href="https://developer.apple.com/documentation/quartzcore/cametallayer/1478170-colorspace">colorspace property.</a> This works because there is a secondary pass which will convert the pixels into the color space of the monitor. (Presumably iOS doesn’t have this pass for performance, thereby leading to the restriction on which color spaces a pixel value is represented as.) Therefore, to output colors using P3, simply assign the appropriate color space value to the CALayer you are using with Metal. If you do this, remember that “1.0” doesn’t represent sRGB’s 1.0, instead it represents the most saturated color in the new color space. If you don’t also change your rendering code to compensate for this, your colors will be stretched across the gamut, leading to oversaturated colors and ugly renderings. You can solve this by setting this to the <a href="https://developer.apple.com/documentation/coregraphics/kcgcolorspaceextendedsrgb?language=objc">new “Extended sRGB”</a> color space of CGColor, which will cause you to have the same rendering as iOS (and allowing values > 1.0). Note that if you do this, you can’t render to an integer pixel format, because those are clipped at 1.0; instead, you’ll have to render to a floating-point pixel format so that you can have values > 1.0.</span></div>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal; min-height: 13.1px;">
<span style="-webkit-font-kerning: none; font-size: 11pt;"></span><br /></div>
<br />
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal;">
<span style="-webkit-font-kerning: none; font-size: 11pt;">So, on iOS, you have one switch which turns on both deep color and wide color, and on macOS, you have two switches, one of which turns on wide color and one of which turns on deep color.</span></div>
Litherumhttp://www.blogger.com/profile/12738405376090442005noreply@blogger.com0tag:blogger.com,1999:blog-8778351438463999796.post-33051704864860604632017-05-28T22:45:00.003-07:002017-05-28T23:11:12.595-07:00Chromaticity DiagramsHumans can see light of the wavelengths between around 380 nm and 780 nm. We see many photons at a time, and we recognize the collection of photons as a particular color. Each photon has a frequency, which means that a particular color is the effect of how much power the photons have at each particular frequency. Put another way, a color is a distribution of power throughout the visible wavelengths of light. For example, if 1/3 of your power is at 700 nm and 2/3 of your power is at 400 nm, the color is a deep purple. This color can be described by the 2-dimensional function:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiFKezAG6DeLmvFPdWj3pQ7G8MtRut6iLmvIZ6U6rRk3K_2trX8r6v7TRk7gby5EDGea9VJUeGQUyKZ8Kd9yxR6LvXJu-LXRAFuG_qbdzM631vBrTnvHRgGHbU1aK6iYsl5DxIQlILDF67F/s1600/Screen+Shot+2017-05-28+at+7.51.11+PM.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="758" data-original-width="1076" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiFKezAG6DeLmvFPdWj3pQ7G8MtRut6iLmvIZ6U6rRk3K_2trX8r6v7TRk7gby5EDGea9VJUeGQUyKZ8Kd9yxR6LvXJu-LXRAFuG_qbdzM631vBrTnvHRgGHbU1aK6iYsl5DxIQlILDF67F/s320/Screen+Shot+2017-05-28+at+7.51.11+PM.png" width="320" /></a></div>
<br />
<br />
Different curves over this domain represent different colors. Here is the curve for daylight:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgJ81lyP1-mKDQWCcXCMT2w0iMRG11kRcBrstxO37_wHOu7OwiO4xtTh7hq9bYcvMakYneq3re_0bJKUH7uhqP1JY4AD3os7HJbWSVABrWJZf7ax6vUNqC1JgdWy1uD6JrCwym-0in8aGeh/s1600/unnamed.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="410" data-original-width="591" height="221" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgJ81lyP1-mKDQWCcXCMT2w0iMRG11kRcBrstxO37_wHOu7OwiO4xtTh7hq9bYcvMakYneq3re_0bJKUH7uhqP1JY4AD3os7HJbWSVABrWJZf7ax6vUNqC1JgdWy1uD6JrCwym-0in8aGeh/s320/unnamed.jpg" width="320" /></a></div>
<br />
<br />
So, if we want to represent a color, we can describe the power function over the domain of visible wavelengths. However, we can do better if we include some biology.<br />
<br />
<h2>
Biology</h2>
We have three types of cells (called “cones”) in our eyes which react to light. Each of the three kinds of cones are sensitive to different wavelengths of light. Cones only exist the the center of our eye (the “fovea”) and not in our peripheral vision, so this model is only accurate to describe the colors we are directly looking at. Here is a graph of the sensitivities of the three kinds of cones:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiS1f0DddHdXfb3C4oaf8G9XFk8jv5vQWCZFIXrp-xgag9yK-OCAMIksBUzjNQTIlTOK3kjD5E5APsZvvfnGS2GrnzwoAk4LXwi68yX7H2YcnwZWyJtc5GsM7-63EU9f0SI7uhLE8geMRME/s1600/Screen+Shot+2017-05-28+at+7.56.16+PM.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="700" data-original-width="920" height="243" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiS1f0DddHdXfb3C4oaf8G9XFk8jv5vQWCZFIXrp-xgag9yK-OCAMIksBUzjNQTIlTOK3kjD5E5APsZvvfnGS2GrnzwoAk4LXwi68yX7H2YcnwZWyJtc5GsM7-63EU9f0SI7uhLE8geMRME/s320/Screen+Shot+2017-05-28+at+7.56.16+PM.png" width="320" /></a></div>
<br />
<br />
Here, you can see that the “S” cones are mostly sensitive to light at around 430 nm, but they still respond to light within a window of about 75 nm around it. You can also see that if all the light entering your eye is at 540 nm, the M cones will respond the most, the L cones will respond strongly (but not as much as the M cones), and the S cones will respond almost not at all.<br />
<br />
This means that the entire power distribution of light entering your eye is encoded as three values on its way to your brain. There are significantly fewer degrees of freedom in this encoding than there are in the source frequency distribution. This means that information is lost. Put another way, there are many different frequency distributions which get encoded the same way by your cones’ response.<br />
<br />
This is actually a really interesting finding. It means we can represent color by three values instead of a whole function across the frequency spectrum, which results in a significant space savings. It also means that, if you have two colors which appear to “match,” their frequency distributions may not match, so if you perform a modification the same way to both colors, they may cease to match.<br />
<br />
If you think about it, though, this is the principle that computer monitors and TVs use. They have phosphors in them which emit light at a particular frequency. When we watch TV, the frequency diagram of the light we are viewing contains three spikes at the three frequencies of phosphors. However, the images we see appear to match our idea of nature, which is represented by a much more continuous and flat frequency diagram. Somehow, the images we see on TV and the images we see in nature match.<br />
<br />
<h2>
Describing color</h2>
So color can be represented by a triple of numbers: (Response strength of S cones, Response strength of M cones, Response strength of L cones). Every combination of these three values represents every color we can perceive.<br />
<br />
It would be great if we could simply represent a color by the response strength of each of the particular kinds of cones in our eyes; however, this is difficult to measure. Instead, let’s pick frequencies of light which we can easily produce. Let’s also select these frequencies such that they will correspond as well as possible to each of the three cones. By varying the power these lights produce, we should be able to produce many of the three triples, and therefore many of the colors we can see.<br />
<br />
In 1931, two experiments attempted to “match” colors using lights of 435.8 nm, 546.1 nm, and 700 nm (let’s call them “blue,” “green,” and “red” lamps). The first two wavelengths are easily created by using mercury vapor tubes, and correspond to the S and M cones, respectively. The last frequency corresponds to the L cones, and, though isn’t easily created with mercury, is insensitive to small errors because the L cones’ frequency is close to flat in this neighborhood.<br />
<br />
So, which colors should be matched? Every color can be decomposed to a collection of power values at particular frequencies. Therefore, if we could find a way to match every frequency in the observable range by humans, this data would be sufficient to match any color. For example, if you have a color with a peak at 680 nm and a peak at 400 nm, and you know that 680 nm light corresponds to our lamp powers of (a, b, c) and 400 nm corresponds to our lamp powers of (d, e, f), then the (a + d, b + e, c + f) should match our color.<br />
<br />
This was performed by two people: Guild and Wright, using 7 and 10 samples, respectively (and averaging the results). They went through every frequency of light in the visible range, and found how much power they had to make each of the lamps emit in order to match the color.<br />
<br />
However they found something a little upsetting. Consider the challenge of matching light at wavelength 510 nm. At this wavelength, we can see that the S cones would react near 0, and that the M cones would react maybe 20% more than the L cones. So, we are looking for how much power our primaries should emit to construct this same response in our cones.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgdQ6t5W7S1kZ_Gi_wCDwxAkiJpxAI4oV5GDkCwakqBvm71R3KZPyx0rUuZ9BOy62hijeGCDSOdFZ8139YlQUJGFpMUQgtf0xHlAdo4r0Um-kTMh8ZJRCSrb_kQ99r3GclbyyVJNUn2JoTy/s1600/Screen+Shot+2017-05-28+at+7.56.16+PM+copy.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="700" data-original-width="920" height="243" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgdQ6t5W7S1kZ_Gi_wCDwxAkiJpxAI4oV5GDkCwakqBvm71R3KZPyx0rUuZ9BOy62hijeGCDSOdFZ8139YlQUJGFpMUQgtf0xHlAdo4r0Um-kTMh8ZJRCSrb_kQ99r3GclbyyVJNUn2JoTy/s320/Screen+Shot+2017-05-28+at+7.56.16+PM+copy.png" width="320" /></a></div>
(The grey bars are our primaries, and the light blue bar is our target)<br />
<br />
Our primaries lie at 435.8 nm, 546.1 nm, and 700 nm. So, the blue lamp should be at or near 0; so far so good. If we select a power of the green light which gives us the correct M cone response, we find that it causes too high of an L cone response (because of how the cones overlap). Adding more of the red light only causes the problem to grow. Therefore, because the cones overlap, it is impossible to achieve a color match with this wavelength of light using these primaries.<br />
<br />
The solution is to subtract the red light instead of adding it. The reason we couldn’t find a match before is because our green light added too much L cone response. If we could remove some of the L cone’s response, our color would match. We can do this by, instead of matching against 520 nm, let’s instead match against the sum of 520nm plus some of our red lamp. This has the effect of subtracting out some of the L cone response, and lets us match our color.<br />
<br />
Using this approach, we can construct a graph, where for each wavelength, the three powers of the three lights are plotted. It will include negative values where matches would otherwise be impossible.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiX4XNHUG7_TNfq73k4beWDbWGBPzlFp68D-Yq4TtnQ3lP0Z5Cv0HD9Bm0qAmtZT8yqrJo2aZ2eL-4ne-6F2INL9IsGiaEXwjCWLNPWTm81-JQO-jdKCNIHkGpf4lZnWQNrTVs1W2j2yI_p/s1600/Screen+Shot+2017-05-28+at+8.18.54+PM.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="816" data-original-width="1284" height="203" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiX4XNHUG7_TNfq73k4beWDbWGBPzlFp68D-Yq4TtnQ3lP0Z5Cv0HD9Bm0qAmtZT8yqrJo2aZ2eL-4ne-6F2INL9IsGiaEXwjCWLNPWTm81-JQO-jdKCNIHkGpf4lZnWQNrTVs1W2j2yI_p/s320/Screen+Shot+2017-05-28+at+8.18.54+PM.png" width="320" /></a></div>
<br />
<br />
<h2>
X Y Z color space</h2>
Once we have this, we now can represent any color by a triple, possibly negative, where each value in the triple represents the power of our particular primary. However, the fact that these values can be negative kind of sucks. In particular, machines were created which can measure colors, but the machines would have to be more complicated if some of the values could be negative.<br />
<br />
Luckily, the power of light follows mathematical operations. In particular, addition and multiplication hold. Color A plus color B yields a consistent result, no matter what frequency distribution color A is represented by. The same is true for multiplication. This means that we are actually dealing with a vector space. A vector space can be transformed via a linear transformation.<br />
<br />
So, the power values of each of the lights at each frequency were transformed such that the resulting values were positive at each frequency. This new, transformed, vector space, is called X Y Z, and is not physically-based.<br />
<br />
Given these new non-physical primaries, you can construct a similar graph. It shows, for each frequency of light, how much of each primary is necessary to represent that frequency.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgTBocHov8oYeCxhdfQDRLPHPoKNZpmCSEdnxcvLJ4EKkEiTNdCZhQQjpzJEFULNVGtA4HTlQHwklGnwcq19L_PNa94HI2T7vDsiYikWiOM_fb0iWpLq2Blf-3y1rrMJdLAaMifizow-8ql/s1600/Screen+Shot+2017-05-28+at+8.14.12+PM.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="926" data-original-width="1522" height="194" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgTBocHov8oYeCxhdfQDRLPHPoKNZpmCSEdnxcvLJ4EKkEiTNdCZhQQjpzJEFULNVGtA4HTlQHwklGnwcq19L_PNa94HI2T7vDsiYikWiOM_fb0iWpLq2Blf-3y1rrMJdLAaMifizow-8ql/s320/Screen+Shot+2017-05-28+at+8.14.12+PM.png" width="320" /></a></div>
<br />
<br />
<h2>
Chromaticity graphs</h2>
So, for each frequency of light, we have an associated (X, Y, Z) triple. Let’s plot it on a 3-D graph!<br />
<br />
<div>
<input checked="" id="axes" type="checkbox" /><label for="axes">Axes</label></div>
<div>
<input checked="" id="xyz" type="checkbox" /><label for="xyz">XYZ Triplets</label></div>
<div>
<input checked="" id="xyzProjection" type="checkbox" /><label for="xyzProjection">Projection of XYZ Triplets onto X+Y+Z=1 plane</label></div>
<div>
<input id="srgbCube" type="checkbox" /><label for="srgbCube">sRGB Cube</label></div>
<div>
<input checked="" id="srgbProjection" type="checkbox" /><label for="srgbProjection">Projection of sRGB Primaries onto X+Y+Z=1 plane</label></div>
<div>
<input id="p3Projection" type="checkbox" /><label for="p3Projection">Projection of DCI-P3 Primaries onto X+Y+Z=1 plane</label></div>
<canvas height="400" id="canvas" width="400"></canvas>
<script id="vertexShader" type="x-shader/x-vertex">
uniform mat4 modelViewProjectionMatrix;
attribute vec3 position;
void main() {
gl_Position = modelViewProjectionMatrix * vec4(position, 1);
}
</script>
<script id="projectionVertexShader" type="x-shader/x-vertex">
uniform mat4 modelViewProjectionMatrix;
attribute vec3 position;
void main() {
vec3 projectedPosition = position / (position.x + position.y + position.z);
gl_Position = modelViewProjectionMatrix * vec4(projectedPosition, 1);
}
</script>
<script id="sRGBVertexShader" type="x-shader/x-vertex">
uniform mat4 modelViewProjectionMatrix;
varying vec3 xyzCoordinates;
attribute vec3 position;
void main() {
xyzCoordinates = position;
gl_Position = modelViewProjectionMatrix * vec4(position, 1);
}
</script>
<script id="fragmentShader" type="x-shader/x-fragment">
precision mediump float;
uniform vec4 color;
void main() {
gl_FragColor = color;
}
</script>
<script id="sRGBFragmentShader" type="x-shader/x-fragment">
precision mediump float;
varying vec3 xyzCoordinates;
void main() {
vec3 column0 = vec3(3.2406, -0.9689, 0.0557);
vec3 column1 = vec3(-1.5372, 1.8758, -0.2040);
vec3 column2 = vec3(-0.4986, -0.0415, 1.0570);
mat3 conversionMatrix = mat3(column0, column1, column2);
vec3 linearSRGBColor = conversionMatrix * xyzCoordinates;
// The triangle we are rendering is the projetion of the corners of the sRGB cube onto the X + Y + Z = 1 plane, which
// totally could be outside sRGB. The default behavior is to clamp each component individually, but we don't want that.
// Instead, we want to scale the result until we are in-gamut.
vec3 colorForDrawing = linearSRGBColor / (linearSRGBColor.x + linearSRGBColor.y + linearSRGBColor.z);
// "Conventionally, OpenGL assumes framebuffer color components are stored in a linear color space."
// So we need to output linear sRGB, and then hope the OS fixes it up on its way to the monitor.
gl_FragColor = vec4(colorForDrawing, 0.85);
}
</script>
<script>
function visualizeXYZ(canvas, vertexShaderElement, projectionVertexShaderElement, sRGBVertexShaderElement, fragmentShaderElement, sRGBFragmentShaderElement) {
var data = [
0.0014, 0.0000, 0.0065,
0.0022, 0.0001, 0.0105,
0.0042, 0.0001, 0.0201,
0.0076, 0.0002, 0.0362,
0.0143, 0.0004, 0.0679,
0.0232, 0.0006, 0.1102,
0.0435, 0.0012, 0.2074,
0.0776, 0.0022, 0.3713,
0.1344, 0.0040, 0.6456,
0.2148, 0.0073, 1.0391,
0.2839, 0.0116, 1.3856,
0.3285, 0.0168, 1.6230,
0.3483, 0.0230, 1.7471,
0.3481, 0.0298, 1.7826,
0.3362, 0.0380, 1.7721,
0.3187, 0.0480, 1.7441,
0.2908, 0.0600, 1.6692,
0.2511, 0.0739, 1.5281,
0.1954, 0.0910, 1.2876,
0.1421, 0.1126, 1.0419,
0.0956, 0.1390, 0.8130,
0.0580, 0.1693, 0.6162,
0.0320, 0.2080, 0.4652,
0.0147, 0.2586, 0.3533,
0.0049, 0.3230, 0.2720,
0.0024, 0.4073, 0.2123,
0.0093, 0.5030, 0.1582,
0.0291, 0.6082, 0.1117,
0.0633, 0.7100, 0.0782,
0.1096, 0.7932, 0.0573,
0.1655, 0.8620, 0.0422,
0.2257, 0.9149, 0.0298,
0.2904, 0.9540, 0.0203,
0.3597, 0.9803, 0.0134,
0.4334, 0.9950, 0.0087,
0.5121, 1.0000, 0.0057,
0.5945, 0.9950, 0.0039,
0.6784, 0.9786, 0.0027,
0.7621, 0.9520, 0.0021,
0.8425, 0.9154, 0.0018,
0.9163, 0.8700, 0.0017,
0.9786, 0.8163, 0.0014,
1.0263, 0.7570, 0.0011,
1.0567, 0.6949, 0.0010,
1.0622, 0.6310, 0.0008,
1.0456, 0.5668, 0.0006,
1.0026, 0.5030, 0.0003,
0.9384, 0.4412, 0.0002,
0.8544, 0.3810, 0.0002,
0.7514, 0.3210, 0.0001,
0.6424, 0.2650, 0.0000,
0.5419, 0.2170, 0.0000,
0.4479, 0.1750, 0.0000,
0.3608, 0.1382, 0.0000,
0.2835, 0.1070, 0.0000,
0.2187, 0.0816, 0.0000,
0.1649, 0.0610, 0.0000,
0.1212, 0.0446, 0.0000,
0.0874, 0.0320, 0.0000,
0.0636, 0.0232, 0.0000,
0.0468, 0.0170, 0.0000,
0.0329, 0.0119, 0.0000,
0.0227, 0.0082, 0.0000,
0.0158, 0.0057, 0.0000,
0.0114, 0.0041, 0.0000,
0.0081, 0.0029, 0.0000,
0.0058, 0.0021, 0.0000,
0.0041, 0.0015, 0.0000,
0.0029, 0.0010, 0.0000,
0.0020, 0.0007, 0.0000,
0.0014, 0.0005, 0.0000,
0.0010, 0.0004, 0.0000,
0.0007, 0.0002, 0.0000,
0.0005, 0.0002, 0.0000,
0.0003, 0.0001, 0.0000,
0.0002, 0.0001, 0.0000,
0.0002, 0.0001, 0.0000,
0.0001, 0.00002, 0.0000,
];
function onContextLost() {
}
function onContextRestored() {
}
function crossProduct(u, v) {
return [u[1] * v[2] - u[2] * v[1], u[2] * v[0] - u[0] * v[2], u[0] * v[1] - u[1] * v[0]];
}
function unitVector(v) {
var length = Math.sqrt(Math.pow(v[0], 2) + Math.pow(v[1], 2) + Math.pow(v[2], 2));
return [v[0] / length, v[1] / length, v[2] / length];
}
function constructLookAtMatrix(eyePosition, centerPosition, upVector) {
var f = [centerPosition[0] - eyePosition[0], centerPosition[1] - eyePosition[1], centerPosition[2] - eyePosition[2]];
var fUnit = unitVector(f);
var upUnit = unitVector(upVector);
var s = crossProduct(fUnit, upUnit);
var sUnit = unitVector(s);
var u = crossProduct(sUnit, fUnit);
return DOMMatrix.fromMatrix({
m11: s[0], m21: s[1] , m31: s[2] , m41: -eyePosition[0],
m12: u[0], m22: u[1] , m32: u[2] , m42: -eyePosition[1],
m13: -fUnit[0], m23: -fUnit[1], m33: -fUnit[2], m43: -eyePosition[2],
m14: 0 , m24: 0 , m34: 0 , m44: 1,
});
}
function constructPerspectiveMatrix(yFieldOfView, aspectRatio, nearPlaneDistance, farPlaneDistance) {
var f = 1 / Math.tan(yFieldOfView / 2);
var m11 = f / aspectRatio
var m22 = f;
var m33 = (farPlaneDistance + nearPlaneDistance) / (nearPlaneDistance - farPlaneDistance);
var m43 = (2 * farPlaneDistance * nearPlaneDistance) / (nearPlaneDistance - farPlaneDistance);
return DOMMatrix.fromMatrix({
m11: m11, m21: 0 , m31: 0 , m41: 0 ,
m12: 0 , m22: m22, m32: 0 , m42: 0 ,
m13: 0 , m23: 0 , m33: m33, m43: m43,
m14: 0 , m24: 0 , m34: -1 , m44: 1 ,
});
}
function constructModelViewProjectionMatrix(aspectRatio, angle) {
var modelToWorldMatrix = new DOMMatrix();
modelToWorldMatrix.rotateAxisAngleSelf(0, 1, 0, angle);
modelToWorldMatrix.translateSelf(-0.5, -0.5, -0.5);
var worldToCameraMatrix = constructLookAtMatrix([0, 0.15, 1], [0, 0, 0], [0, 1, 0]);
var projectionMatrix = constructPerspectiveMatrix(0.8 * (Math.PI / 2), aspectRatio, 0.1, 10);
return projectionMatrix.multiply(worldToCameraMatrix).multiply(modelToWorldMatrix);
}
function matrixToSequence(matrix) {
// Column major
return [matrix.m11, matrix.m12, matrix.m13, matrix.m14,
matrix.m21, matrix.m22, matrix.m23, matrix.m24,
matrix.m31, matrix.m32, matrix.m33, matrix.m34,
matrix.m41, matrix.m42, matrix.m43, matrix.m44];
}
var theta = 0;
var dragging = false;
var context;
var program;
var projectionProgram;
var sRGBProgram;
var vertexBufferAttribLocation;
var projectionVertexBufferAttribLocation;
var sRGBVertexBufferAttribLocation;
var axesVertexBuffer;
var dataVertexBuffer;
var sRGBVertexBuffer;
var sRGBCubeVertexBuffer;
var p3VertexBuffer;
var modelViewProjectionMatrixLocation;
var colorLocation;
var projectionModelViewProjectionMatrixLocation;
var projectionColorLocation;
var sRGBModelViewProjectionMatrixLocation;
var previousTime;
var dataVertices;
var drawAxes;
var drawXYZ;
var drawXYZProjection;
var drawSRGBCube;
var drawSRGBProjection;
var drawP3Projection;
function draw(timeDelta, aspectRatio) {
context.clear(context.COLOR_BUFFER_BIT | context.DEPTH_BUFFER_BIT);
if (!dragging)
theta += timeDelta * 0.05;
var modelViewProjectionMatrix = matrixToSequence(constructModelViewProjectionMatrix(aspectRatio, theta));
context.useProgram(program);
context.bindBuffer(context.ARRAY_BUFFER, axesVertexBuffer);
context.vertexAttribPointer(vertexBufferAttribLocation, 3, context.FLOAT, false, 0, 0);
context.uniformMatrix4fv(modelViewProjectionMatrixLocation, false, modelViewProjectionMatrix);
context.uniform4fv(colorLocation, [1, 0, 0, 1]);
if (drawAxes)
context.drawArrays(context.LINES, 0, 6);
context.bindBuffer(context.ARRAY_BUFFER, sRGBCubeVertexBuffer);
context.vertexAttribPointer(vertexBufferAttribLocation, 3, context.FLOAT, false, 0, 0);
context.uniform4fv(colorLocation, [0, 0, 1, 1]);
if (drawSRGBCube)
context.drawArrays(context.LINES, 0, 24);
context.bindBuffer(context.ARRAY_BUFFER, p3VertexBuffer);
context.vertexAttribPointer(vertexBufferAttribLocation, 3, context.FLOAT, false, 0, 0);
context.uniform4fv(colorLocation, [1, 0, 1, 1]);
if (drawP3Projection)
context.drawArrays(context.LINE_LOOP, 0, 3);
context.bindBuffer(context.ARRAY_BUFFER, dataVertexBuffer);
context.vertexAttribPointer(vertexBufferAttribLocation, 3, context.FLOAT, false, 0, 0);
context.uniform4fv(colorLocation, [1, 1, 1, 1]);
if (drawXYZ)
context.drawArrays(context.LINE_LOOP, 0, dataVertices);
context.useProgram(projectionProgram);
context.bindBuffer(context.ARRAY_BUFFER, dataVertexBuffer);
context.vertexAttribPointer(projectionVertexBufferAttribLocation, 3, context.FLOAT, false, 0, 0);
context.uniformMatrix4fv(projectionModelViewProjectionMatrixLocation, false, modelViewProjectionMatrix);
context.uniform4fv(projectionColorLocation, [0, 1, 0, 1]);
if (drawXYZProjection)
context.drawArrays(context.LINE_LOOP, 0, dataVertices);
context.useProgram(sRGBProgram);
context.bindBuffer(context.ARRAY_BUFFER, sRGBVertexBuffer);
context.vertexAttribPointer(sRGBVertexBufferAttribLocation, 3, context.FLOAT, false, 0, 0);
context.uniformMatrix4fv(sRGBModelViewProjectionMatrixLocation, false, modelViewProjectionMatrix);
if (drawSRGBProjection)
context.drawArrays(context.TRIANGLES, 0, 3);
}
function tick(time) {
if (previousTime == undefined)
previousTime = time;
var aspectRatio = canvas.clientWidth / canvas.clientHeight;
context.viewport(0, 0, canvas.clientWidth, canvas.clientHeight);
draw(time - previousTime, aspectRatio);
previousTime = time;
window.requestAnimationFrame(tick);
}
var offsetX;
var offsetY;
function onMouseDrag(event) {
var newX = event.offsetX;
var newY = event.offsetY;
var deltaX = newX - offsetX;
var deltaY = newY - offsetY;
theta += deltaX;
offsetX = newX;
offsetY = newY;
}
function onMouseUp() {
dragging = false;
canvas.removeEventListener("mousemove", onMouseDrag, false);
canvas.removeEventListener("mouseup", onMouseUp, false);
mouseDragListener = undefined;
mouseUpListener = undefined;
}
function onMouseDown(event) {
if (!dragging) {
dragging = true;
mouseDragListener = canvas.addEventListener("mousemove", onMouseDrag, false);
mouseUpListener = canvas.addEventListener("mouseup", onMouseUp, false);
offsetX = event.offsetX;
offsetY = event.offsetY;
}
}
function setDrawBooleans() {
drawAxes = document.getElementById("axes").checked;
drawXYZ = document.getElementById("xyz").checked;
drawXYZProjection = document.getElementById("xyzProjection").checked;
drawSRGBCube = document.getElementById("srgbCube").checked;
drawSRGBProjection = document.getElementById("srgbProjection").checked;
drawP3Projection = document.getElementById("p3Projection").checked;
}
function start() {
canvas.addEventListener("mousedown", onMouseDown, false);
context = canvas.getContext("webgl");
setDrawBooleans();
document.getElementById("axes").addEventListener("change", setDrawBooleans, false);
document.getElementById("xyz").addEventListener("change", setDrawBooleans, false);
document.getElementById("xyzProjection").addEventListener("change", setDrawBooleans, false);
document.getElementById("srgbCube").addEventListener("change", setDrawBooleans, false);
document.getElementById("srgbProjection").addEventListener("change", setDrawBooleans, false);
document.getElementById("p3Projection").addEventListener("change", setDrawBooleans, false);
canvas.addEventListener("webglcontextlost", onContextLost, false);
canvas.addEventListener("webglcontextrestored", onContextRestored, false);
var vertexShader = context.createShader(context.VERTEX_SHADER);
context.shaderSource(vertexShader, vertexShaderElement.text);
context.compileShader(vertexShader);
var compiled = context.getShaderParameter(vertexShader, context.COMPILE_STATUS);
var projectionVertexShader = context.createShader(context.VERTEX_SHADER);
context.shaderSource(projectionVertexShader, projectionVertexShaderElement.text);
context.compileShader(projectionVertexShader);
compiled = context.getShaderParameter(projectionVertexShader, context.COMPILE_STATUS);
var sRGBVertexShader = context.createShader(context.VERTEX_SHADER);
context.shaderSource(sRGBVertexShader, sRGBVertexShaderElement.text);
context.compileShader(sRGBVertexShader);
compiled = context.getShaderParameter(sRGBVertexShader, context.COMPILE_STATUS);
var fragmentShader = context.createShader(context.FRAGMENT_SHADER);
context.shaderSource(fragmentShader, fragmentShaderElement.text);
context.compileShader(fragmentShader);
compiled = context.getShaderParameter(fragmentShader, context.COMPILE_STATUS);
var sRGBFragmentShader = context.createShader(context.FRAGMENT_SHADER);
context.shaderSource(sRGBFragmentShader, sRGBFragmentShaderElement.text);
context.compileShader(sRGBFragmentShader);
compiled = context.getShaderParameter(sRGBFragmentShader, context.COMPILE_STATUS);
program = context.createProgram();
context.attachShader(program, vertexShader);
context.attachShader(program, fragmentShader);
context.linkProgram(program);
var linked = context.getProgramParameter(program, context.LINK_STATUS);
projectionProgram = context.createProgram();
context.attachShader(projectionProgram, projectionVertexShader);
context.attachShader(projectionProgram, fragmentShader);
context.linkProgram(projectionProgram);
linked = context.getProgramParameter(projectionProgram, context.LINK_STATUS);
sRGBProgram = context.createProgram();
context.attachShader(sRGBProgram, sRGBVertexShader);
context.attachShader(sRGBProgram, sRGBFragmentShader);
context.linkProgram(sRGBProgram);
linked = context.getProgramParameter(sRGBProgram, context.LINK_STATUS);
var vertices = new Float32Array([0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1]);
axesVertexBuffer = context.createBuffer();
context.bindBuffer(context.ARRAY_BUFFER, axesVertexBuffer);
context.bufferData(context.ARRAY_BUFFER, vertices, context.STATIC_DRAW);
dataVertexBuffer = context.createBuffer();
context.bindBuffer(context.ARRAY_BUFFER, dataVertexBuffer);
context.bufferData(context.ARRAY_BUFFER, Float32Array.from(data), context.STATIC_DRAW);
dataVertices = data.length / 3;
var linearSRGBToXYZTransformation = DOMMatrix.fromMatrix({
m11: 0.4124, m21: 0.3576, m31: 0.1805,
m12: 0.2126, m22: 0.7152, m32: 0.0722,
m13: 0.0193, m23: 0.1192, m33: 0.9505,
});
var corner0 = [linearSRGBToXYZTransformation.m11, linearSRGBToXYZTransformation.m12, linearSRGBToXYZTransformation.m13];
var corner1 = [linearSRGBToXYZTransformation.m21, linearSRGBToXYZTransformation.m22, linearSRGBToXYZTransformation.m23];
var corner2 = [linearSRGBToXYZTransformation.m31, linearSRGBToXYZTransformation.m32, linearSRGBToXYZTransformation.m33];
var corner0Sum = corner0[0] + corner0[1] + corner0[2];
var corner1Sum = corner1[0] + corner1[1] + corner1[2];
var corner2Sum = corner2[0] + corner2[1] + corner2[2];
var projectedCorner0 = [corner0[0] / corner0Sum, corner0[1] / corner0Sum, corner0[2] / corner0Sum];
var projectedCorner1 = [corner1[0] / corner1Sum, corner1[1] / corner1Sum, corner1[2] / corner1Sum];
var projectedCorner2 = [corner2[0] / corner2Sum, corner2[1] / corner2Sum, corner2[2] / corner2Sum];
var sRGBVertices = new Float32Array([projectedCorner0[0], projectedCorner0[1], projectedCorner0[2], projectedCorner1[0], projectedCorner1[1], projectedCorner1[2], projectedCorner2[0], projectedCorner2[1], projectedCorner2[2]]);
sRGBVertexBuffer = context.createBuffer();
context.bindBuffer(context.ARRAY_BUFFER, sRGBVertexBuffer);
context.bufferData(context.ARRAY_BUFFER, sRGBVertices, context.STATIC_DRAW);
var sRGBWhite = [corner0[0] + corner1[0] + corner2[0], corner0[1] + corner1[1] + corner2[1], corner0[2] + corner1[2] + corner2[2]];
var sRGBCubeVertices = new Float32Array([
0, 0, 0,
corner0[0], corner0[1], corner0[2],
0, 0, 0,
corner1[0], corner1[1], corner1[2],
0, 0, 0,
corner2[0], corner2[1], corner2[2],
corner0[0], corner0[1], corner0[2],
corner0[0] + corner1[0], corner0[1] + corner1[1], corner0[2] + corner1[2],
corner1[0], corner1[1], corner1[2],
corner0[0] + corner1[0], corner0[1] + corner1[1], corner0[2] + corner1[2],
corner1[0], corner1[1], corner1[2],
corner1[0] + corner2[0], corner1[1] + corner2[1], corner1[2] + corner2[2],
corner2[0], corner2[1], corner2[2],
corner1[0] + corner2[0], corner1[1] + corner2[1], corner1[2] + corner2[2],
corner0[0], corner0[1], corner0[2],
corner0[0] + corner2[0], corner0[1] + corner2[1], corner0[2] + corner2[2],
corner2[0], corner2[1], corner2[2],
corner0[0] + corner2[0], corner0[1] + corner2[1], corner0[2] + corner2[2],
sRGBWhite[0], sRGBWhite[1], sRGBWhite[2],
corner0[0] + corner1[0], corner0[1] + corner1[1], corner0[2] + corner1[2],
sRGBWhite[0], sRGBWhite[1], sRGBWhite[2],
corner1[0] + corner2[0], corner1[1] + corner2[1], corner1[2] + corner2[2],
sRGBWhite[0], sRGBWhite[1], sRGBWhite[2],
corner0[0] + corner2[0], corner0[1] + corner2[1], corner0[2] + corner2[2],
]);
sRGBCubeVertexBuffer = context.createBuffer();
context.bindBuffer(context.ARRAY_BUFFER, sRGBCubeVertexBuffer);
context.bufferData(context.ARRAY_BUFFER, sRGBCubeVertices, context.STATIC_DRAW);
var p3Vertices = new Float32Array([0.680, 0.320, 1 - (0.680 + 0.320), 0.265, 0.690, 1 - (0.265 + 0.690), 0.150, 0.060, 1 - (0.150 + 0.060)]);
p3VertexBuffer = context.createBuffer();
context.bindBuffer(context.ARRAY_BUFFER, p3VertexBuffer);
context.bufferData(context.ARRAY_BUFFER, p3Vertices, context.STATIC_DRAW);
context.useProgram(program);
vertexBufferAttribLocation = context.getAttribLocation(program, "position");
context.enableVertexAttribArray(vertexBufferAttribLocation);
modelViewProjectionMatrixLocation = context.getUniformLocation(program, "modelViewProjectionMatrix");
colorLocation = context.getUniformLocation(program, "color");
context.useProgram(projectionProgram);
projectionVertexBufferAttribLocation = context.getAttribLocation(projectionProgram, "position");
context.enableVertexAttribArray(projectionVertexBufferAttribLocation);
projectionModelViewProjectionMatrixLocation = context.getUniformLocation(projectionProgram, "modelViewProjectionMatrix");
projectionColorLocation = context.getUniformLocation(projectionProgram, "color");
context.useProgram(sRGBProgram);
sRGBVertexBufferAttribLocation = context.getAttribLocation(sRGBProgram, "position");
context.enableVertexAttribArray(sRGBVertexBufferAttribLocation);
sRGBModelViewProjectionMatrixLocation = context.getUniformLocation(sRGBProgram, "modelViewProjectionMatrix");
context.lineWidth(3);
context.clearColor(0, 0, 0, 1);
context.enable(context.DEPTH_TEST);
context.enable(context.BLEND);
context.blendFunc(context.SRC_ALPHA, context.ONE_MINUS_SRC_ALPHA);
window.requestAnimationFrame(tick);
var error = context.getError();
}
start();
}
window.addEventListener("load", function() {
visualizeXYZ(document.getElementById("canvas"), document.getElementById("vertexShader"), document.getElementById("projectionVertexShader"), document.getElementById("sRGBVertexShader"), document.getElementById("fragmentShader"), document.getElementById("sRGBFragmentShader"));
}, false);
</script>
<br />
(Best viewed in Safari Nightly build.)<br />
Click and drag to control the rotation of the graph!<br />
<br />
The white curve is our collection of (X, Y, Z) triples. (The red is our unit axes.)<br />
<br />
Remember that every visible color is represented as a linear combination of the vectors from the origin to points on this (one-dimensional) curve.<br />
<br />
The origin is black, because it represents 0 power. The hue and saturation of the color is described by the orientation of the point, not the distance the point is from the origin. If we want to represent this space in two dimensions, it would make sense to eliminate the brightness component and instead only show the hue and saturation. This can be done by projecting each point onto the X + Y + Z = 1 plane, as seen by the green on the above chart.<br />
<br />
Note that this shape is convex. This is particularly interesting: any point on the contour of the shape, plus any other point on the contour of the shape, yields a point within the interior of the shape (when projected back to our projection plane). Recall that all visible colors are equal the the linear combination of points on the contour of the curve. Therefore, all visible colors equal all the points in the interior of this curve. Points on the exterior of this curve represent colors with negative cone response for at least one type of cone (which cannot happen).<br />
<br />
So, the inside of this horseshoe shape represents every visible color. This shape is usually visualized by projecting it down to the (X, Y) plane. That yields the familiar diagram:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhgZnw84xXJD8gXPcKlFQfhDxlCkJFt-xHsP-Qyp6clBTrrnXL2lZaVYc8DpgVVvtyrzwUueAu_JRGW5izU0afNmGcRUGRIl-_6D5ayQDjQhGQKQXImwPkxD0AppkqY-hL_Bqt2EYHrnUCh/s1600/Screen+Shot+2017-05-28+at+10.38.47+PM.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="692" data-original-width="670" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhgZnw84xXJD8gXPcKlFQfhDxlCkJFt-xHsP-Qyp6clBTrrnXL2lZaVYc8DpgVVvtyrzwUueAu_JRGW5izU0afNmGcRUGRIl-_6D5ayQDjQhGQKQXImwPkxD0AppkqY-hL_Bqt2EYHrnUCh/s320/Screen+Shot+2017-05-28+at+10.38.47+PM.png" width="309" /></a></div>
<br />
<br />
Inside this horseshoe represents every color we can see. Also, you can notice that, because X Y Z was constructed so that every visible color has positive coordinates, and the projection we are viewing is onto the X + Y + Z = 1 plane, all the points on the diagram are below the X + Y = 1 line.<br />
<br />
<h2>
Color spaces</h2>
A color space is usually represented as three primary colors, as well as a white point (or a maximum bounds on the magnitude of each primary color). The colors in the color space are usually represented as a linear combination of the primary colors (subject to some maximum). In our chromaticity diagram, we aren’t concerned with brightness, so we can ignore these maximums values (and associated white-points). Because we know the representable colors in a color space are a linear combination of the primaries, we can plot the primaries in X, Y, Z color space and project them to the same X + Y + Z = 1 plane. Using the same logic we used above, we know that the representable colors in the color space are on the interior of the triangle realized by this projection.<br />
<br />
You can see the result of this projection for the primaries of sRGB in the shaded triangle in the above chart. As you can see, there are many colors that human eyes can see which aren’t representable within sRGB. The chart also allows you to toggle the bounding triangle for the DCI-P3 color space, which Apple recently released on some of its devices. You can see how Display P3 includes more colors than sRGB.<br />
<br />
Because the shape of all visible colors isn’t a triangle, it isn’t possible to create a color space where each primary is a visible color and the colorspace encompasses every visible color. If your color space encompasses every visible color, the primaries must lie outside of the horseshoe and are therefore not visible. If your primaries lie inside the horseshoe, there are visible colors which cannot be captured by your primaries. Having your primaries be real physical colors is valuable so that you can, for example, actually build physical devices which include your primaries (like the phosphors in a computer monitor). You can get closer to encompassing every visible color if you increase the number of primaries to 4 or 5, at the cost of making each color "fatter."<br />
<br />
Keep in mind that these chromaticity diagrams (which are the ones in 2D above) are only useful for plotting individual points. Specifically, 2-D distances across this diagram are not meaningful. Points that are close together on the diagram may not be visually similar, and points which are visually similar may not be close together on the above diagram.<br />
<br />
Also, when reading these horseshoe graphs, realize that they are simply projections of a 3D graph onto a somewhat-arbitrary plane. A better visualization of color would include all three dimensions.Litherumhttp://www.blogger.com/profile/12738405376090442005noreply@blogger.com0tag:blogger.com,1999:blog-8778351438463999796.post-79311456027409928102017-05-20T13:52:00.000-07:002017-05-20T14:24:19.515-07:00Relationship Between Glyphs and Code PointsRecently, there have been some discussions about various Unicode concepts like surrogate pairs, variation selectors, and combining clusters, but I thought I could shed some light into how these pieces all fit together and their relationship with the things we actually see on screen.<br />
<br />
<b style="font-size: larger;">tl;dr: The relationship between what you see on the screen and the unicode string behind it is completely arbitrary.</b><br />
<br />
The biggest piece to understand is the difference between Unicode's specs and the contents of font files. A string is a sequence of code points. Certain code points have certain meanings, which can affect things like the width of the rendered string, caret placement, and editing commands. Once you have a string and you want to render it, you partition it into runs, where each run can be rendered with a single font (and has other properties, like a single direction throughout the run). You then map code points to glyphs one-to-one, and then you run a Turing-complete "shaping" pass over the sequence of glyphs and advances. Once you've got your shaped glyphs and advances, you can finally render them.<br />
<br />
<h2>
Code Points</h2>
<br />
Alright, so what's a code point? A code point is just a number. There are many specs which describe a mapping of number to meaning. Most of them are language-specific, which makes sense because, in any given document, there will likely only be a single language. For example, in the GBK encoding, character number 33088 (which is 0x8140 in hex) represents the 丂 character in Chinese. Unicode includes another such mapping. In Unicode, this same character number represents the 腀 character in Chinese. Therefore, the code point number alone is insufficient unless you know what encoding it is in.<br />
<br />
Unicode is special because it aims to include characters from every writing system on the planet. Therefore, it is a convenient choice for an internal encoding inside text engines. For example, if you didn't have a single internal encoding, all your editing commands would have to be reimplemented for each supported encoding. For this reason (and potentially some others), it has become the standard encoding for most text systems.<br />
<br />
<h3>
UTF-32</h3>
<br />
In Unicode, there are over 1 million (0x10FFFF) available options for code points, though most of those haven't been assigned a meaning yet. This means you need 21 bits to represent a code point. One way to do this is to use a 32-bit type and pad it out with zeroes. This is called UTF-32 (which is just a way of mapping a 21-bit number to a sequence of bytes so it can be stored). If you have one of these strings on disk or in memory, you need to know the endianness of each of these 4-byte numbers so that you can properly interpret it. You should already have an out-of-band mechanism to know what encoding the string is in, so this same mechanism is often re-used to describe the endianness of the bytes. (On the Web, this is HTTP headers or the tag.) There's also this neat hack called Byte Order Markers, if you don't have any out-of-band data.<br />
<br />
<h3>
UTF-16</h3>
<br />
Unfortunately, including 11 bits of 0s for every character is kind of wasteful. There is a more efficient encoding, called UTF-16. In this encoding, each code point may be encoded as either a single 16-bit number or a pair of 16-bit numbers. For code points which fit into a 16-bit number naturally, the encoding is the identity function. Unfortunately, there are over a million (0x100000) code points remaining which don't fit into a 16-bit number themselves. Because there are 20 bits of entropy in these remaining code points, we can split it into a pair of 10 bit numbers, and then encode this pair as two successive "code units." Once you've done that, you need a way of knowing, if someone hands you a 16-bit number, if it's a standalone code point or if it's part of a pair. This is done by reserving two 10-bit ranges inside the character mapping. By saying that code points 0xD800 - 0xDBFF are invalid, and code points 0xDC00 - 0xDFFFF are invalid, we can now use these ranges to encode these 20-bit numbers. So, if someone hands you a 16-bit number, if it's in one of those ranges, you know you need to read a second 16-bit number, mask the 10 low bits of each, shift them together, and add to 0x10000 to get the real code point (otherwise, the number is equal to the code point it represents).<br />
<br />
There are some interesting details here. The first is that the two 10-bit ranges are distinct. It could have been possible to re-use the same 10-bit range for both items in the pair (and use its position in the pair to determine its meaning). However, if you have an item missing from a long string of these surrogates, it may cause every code point after the missing one to be wrong. By using distinct ranges, if you come across an unpaired surrogate (like two high surrogates next to each other), most text systems will simply consider the first surrogate alone, treat it like an unsupported character, and resume processing correctly at the next surrogate.<br />
<br />
<h3>
UTF-8</h3>
<br />
There's also another one called UTF-8, which represents code points as either 1, 2, 3, 4, or 5 byte sequences. Because it uses bytes, endianness is irrelevant. However, the encoding is more complicated and it can be less efficient for some strings than UTF-16. It does have the nice property, however, that no byte within a UTF-8 string can be 0, which means it is compatible with C strings.<br />
<br />
<h3>
"💩".length === 2</h3>
<br />
Because its encoding is somewhat simple, but fairly compact, many text systems including Web browsers, ICU, and Cocoa strings use UTF-16. This decision has actually had kind of a profound impact on the web. It is the reason that the "length" attribute on emoji returns 2: the "length" attribute returns the number of code units in the UTF-16 string, not the number of code points. If it wanted to return the number of code points, it would require linear time to compute. The choice of which number represents which "character" (or emoji) isn't completely arbitrary, but some things we think of as emoji actually have a number value less than 0x10000. This is why some code points have a length of two but some have a length of one.<br />
<br />
<h2>
Combining code points</h2>
<br />
Unicode also includes the concept of combining marks. The idea is that if you want to have the character "é", you can represent it as the "e" character followed by U+301 COMBINING ACUTE ACCENT. This is so that every combination of diacritic marks and base characters doesn't have to be encoded in Unicode. It's important because, once a code point is assigned a meaning, it can never ever be un-assigned.<br />
<br />
To make matters worse, there is also a standalone code point U+E9 LATIN SMALL LETTER E WITH ACUTE. When doing string comparisons, these two strings need to be equal. Therefore, string comparisons aren't just raw byte comparisons.<br />
<br />
This idea can happen even without these zero-width combining marks. In Korean, adjacent letters in words are grouped up to form blocks. For example, the letters ㅂ ㅓ ㅂ join to form the Korean word for "rice:" 법 (read from top left to bottom right). Unicode includes a code point for each letter of the alphabet (ㅂ is U+3142 HANGUL LETTER PIEUP), as well as a code point for each joined block (법 is U+BC95). It also includes joining letters, so 법 can be represented as a single code point, but can also be represented by the string:<br />
<br />
U+1107 HANGUL CHOSEONG PIEUP<br />
U+1161 HANGUL JUNGSEONG A<br />
U+11B8 HANGUL JONGSEONG PIEUP<br />
<br />
This means, in JavaScript, you can have two strings which are treated exactly equally by the text system, and look visually identical (they literally have the same glyph drawn on screen), but have different lengths in JavaScript.<br />
<br />
<h3>
Normalization</h3>
<br />
One way to perform these string comparisons is to use Unicode's notion of "normalization." The idea is that strings which are conceptually equal should be normalized to the same sequence of code points. There are a few different normalization algorithms, depending on if you want the string to be exploded as much as possible into its constituent parts, or if you want it to be combined to be as short as possible, etc.<br />
<br />
<h2>
Fonts</h2>
<br />
When reading text, people see pictures, not numbers. Or, put another way, computer monitors are not capable of showing you numbers; instead, they can only show pictures. All the picture information for text is contained within fonts. Unicode doesn't describe what information is included in a font file.<br />
<br />
When people think of emoji, people usually think of it as the little color pictures inside our text. These little color pictures come from font files. A font file can do whatever it wants with the string it is tasked with rendering. It can draw emoji without color. It can draw non-emoji with color. The idea of color in a glyph is orthogonal to whether or not a code point is classified as "emoji."<br />
<br />
Similarly, a font can include a ligature, which draws multiple code points as a single glyph ("glyph" just means "picture"). A font can also draw a single code point as multiple glyphs (for example, an accent over é may be implemented as as separate glyph from e). But it doesn't have to. The choice of what glyphs to use where is totally an implementation detail of the font. The choice of which glyphs include color is totally an implementation detail of the font. Some ligatures get caret positions inside them; others don't.<br />
<br />
For example, Arabic is a handwritten script, which means that the letters flow together from one to the next. Here are two images of two different fonts (Geeza Pro and Noto Nastaliq Urdu) rendering the same string, where each glyph is painted in a different color. You can see that both fonts show the string with a different number of glyphs. Sometimes diacritics are contained within their base glyph, but sometimes not.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg90OQPb-a2jc9ghhKLdfSy0UAxgI8xefpr8uriJVFRzhPWSI0lbV-1DKzLODSzuUYdOQqE4gYs2lU__K6zqrK9wk4GYnEQ7fBupd_ISqvrTcBsoHWdTYPBYPYnFBPHuuRmiQBCg_fgtJzE/s1600/IMG_0079.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="202" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg90OQPb-a2jc9ghhKLdfSy0UAxgI8xefpr8uriJVFRzhPWSI0lbV-1DKzLODSzuUYdOQqE4gYs2lU__K6zqrK9wk4GYnEQ7fBupd_ISqvrTcBsoHWdTYPBYPYnFBPHuuRmiQBCg_fgtJzE/s320/IMG_0079.jpg" width="320" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgVNDraxRKiQcTX7Si5pNhV2Iryzn-iInFoPld7_NeEUc8QbXbCzpDz3-ROAF7fR9Jge1bY_aXEET8-vw2h63Q0NaK8siRlLDKFF74bSMhpbjo3R6Z98EgxIhn0NLNdJdAQ48XGptS0ZmwD/s1600/IMG_0078.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="233" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgVNDraxRKiQcTX7Si5pNhV2Iryzn-iInFoPld7_NeEUc8QbXbCzpDz3-ROAF7fR9Jge1bY_aXEET8-vw2h63Q0NaK8siRlLDKFF74bSMhpbjo3R6Z98EgxIhn0NLNdJdAQ48XGptS0ZmwD/s320/IMG_0078.jpg" width="320" /></a></div>
<br />
<h2>
Variation Selectors</h2>
<br />
There are other classes of code points which are invisible and are added after a base code point to modify it. One example is the using Variation Selector 15 and Variation Selector 16. The problem these try to solve is the fact that some code points may be drawn in either text style (☃︎) or emoji style (☃️). Variation Selector 16 is an invisible code point that means "please draw the base character like an emoji" while #15 means "please draw the base character like text." The platform also has a default representation which is used when no variation selector is present. Unicode includes a table of which code points should be able to accept these variation selectors (but, like everything Unicode creates, it affects but doesn't dictate implementations).<br />
<br />
These variation selectors are a little special because they are the only combining codepoints I know of that can interact with the "cmap" table is the font, and therefore can affect font selection. This means that a font can say "I support the snowman code point, but not the emoji style of it." Many text systems have special processing for these variation selectors.<br />
<br />
<h2>
Zero-Width-Joiner Sequences</h2>
<br />
Rendering on old platforms is also important when Unicode defines new emoji. Some Unicode characters, such as "👨👩👧" are a collection of things (people) which can already be represented with other code points. This specific "emoji" is actually the string of code points:<br />
<br />
U+1F468 MAN<br />
U+200D ZERO WIDTH JOINER<br />
U+1F469 WOMAN<br />
U+200D ZERO WIDTH JOINER<br />
U+1F467 GIRL<br />
<br />
The zero width joiners are necessary for backwards compatibility. If someone had a string somewhere that was just a list of people in a row, the creation of this new "emoji" shouldn't magically join them up into a family. The benefit of using the collection of code points is that older systems showing the new string will show something understandable instead of just an empty square. Fonts often implement these as ligatures. Unicode specifies which sequences should be represented by a single glyph, but, again, it's up to each implementation to actually do that, and implementations vary.<br />
<br />
<h2>
Caret Positions</h2>
<br />
Similarly to how Unicode describes sequences of codepoints which should visually combine to a single thing, Unicode also describes what a "character" is, in the sense of what most people mean when they say "character." Unicode calls this as a "grapheme clusters." Part of the ICU library (which implements pieces of Unicode) creates iterators which will give you all the locations where lines can break, words can be formed (in Chinese this is hard), and characters' boundaries lie. If you give it the string of "e" followed by U+301 COMBINING ACUTE ACCENT, it should tell you that these codepoints are part of the same grapheme cluster. It does this by ingesting data tables which Unicode creates.<br />
<br />
However, this isn't quite sufficient to know where to put the caret when the user presses the arrow keys, delete key, or forward-delete key (Fn + delete on macOS). Consider the following string in Hindi "कि". This is composed of the following two code points:<br />
<br />
U+915 DEVANAGARI LETTER KA<br />
U+93F DEVANAGARI VOWEL SIGN I<br />
<br />
Here, if you select the text or use arrow keys, the entire string is selected as a unit. However, if you place the caret after the string and press delete, only the U+93F is deleted. This is particularly confusing because this vowel sign is actually drawn to the left of the letter, so it isn't even adjacent to the caret when you press delete. (Hindi is a left-to-right script.) If you place the caret just before the string and press the forward delete key (Fn + delete), both code points get deleted. The user expectations for the results of these kinds of editing commands are somewhat platform-specific, and aren't entirely codified in Unicode currently.<br />
Try it out here:<br />
<div contenteditable="true" style="background: white; color: black;">
==> कि <==</div>
<br />
<h2>
Simplified and Traditional Chinese</h2>
<br />
The Chinese language is many thousands of years old. In the 1950s and 1960s, the Chinese government (PRC) decided that their characters had too many strokes, and simplifying the characters would increase literacy rates. So, they decided to change how about 1/3 of the characters were written. Some of the characters were untouched, some were touched only very slightly, and some were completely changed.<br />
<br />
When Unicode started codifying these characters, they had to figure out whether or not to give these simplified characters new code points. For the code points which were completely unchanged, it is obvious they shouldn't get their own code points. For code points which were entirely changed, it is obvious that they should get their own code points. However, what about the characters which changed only slightly? The characters were decided on a case-by-case basis, and some of these slightly-changed characters did not receive their own new code points.<br />
<br />
This is really problematic for a text engine, because this is a discernible difference between the two, and if you show the wrong one, it's wrong. This means that the text engine has to know out-of-band which one to show.<br />
<br />
Here's an example showing the same code point with two different "lang" tags.<br />
Simplified Chinese:<br />
<div lang="zh-Hans" style="font-size: 60px;">
雪</div>
Traditional Chinese:<br />
<div lang="zh-Hant" style="font-size: 60px;">
雪</div>
<br />
There are a few different mechanisms for this. HTML includes the "lang" attribute, which includes whether or not the language is supposed to be simplified or traditional. This is used during font selection. On macOS and iOS, every Chinese face actually includes two font files: one for Simplified Chinese and one for Traditional Chinese. (For example, PingFang SC and PingFang TC.) Browsers use the language of the element when deciding which of these fonts to use. If the lang tag isn't present or doesn't include the information browsers need, browsers will use the language the machine is configured to use.<br />
<br />
Rather than including two separate fonts for every face, another mechanism to implement this is by using font features. This is part of that "shaping" step I mentioned earlier. This shaping step can include a set of key/value pairs provided by the environment. CSS controls this with the font-variant-east-asian property. This works by having the font include glyphs for both kinds of Chinese, and the correct one is selected as part of text layout. This only works, however, with text renderers which support complex shaping and font features.<br />
<br />
I think there's at least one other way to have a single font file be able to draw both simplified and traditional forms, but I can't remember what they are right now.Litherumhttp://www.blogger.com/profile/12738405376090442005noreply@blogger.com0tag:blogger.com,1999:blog-8778351438463999796.post-59168832516962212682016-11-16T00:08:00.000-08:002016-11-16T00:08:45.040-08:00Single Screen GPU HandoffOver the past few years, a <a href="https://www.microsoft.com/en-us/surface/devices/surface-book/overview">collection</a> of <a href="http://www.apple.com/macbook-pro/">laptops</a> have been released with two graphics cards. The idea is that one is low-power and one is high power. When you want long battery life, you can use the low-power GPU, but when you want high performance, you can use the high-power GPU. However, there is a wrinkle: the laptop only has one screen.<br />
<br />
The screen’s contents have to come from somewhere. One way to implement this system would be to daisy-chain the two GPUs, thereby keeping the screen always plugged into the same GPU. In this system, the primary GPU (which the screen is plugged into) would have to be told to give the results of the secondary GPU to the screen.<br />
<br />
A different approach is to connect both GPUs in parallel with a switch between them. The system will decide when to flip the switch between each of the GPUs. When the screen is connected to one GPU, the other GPU can be turned off completely.<br />
<br />
The question, then, is how this looks to a user application. I’ll be investigating three different scenarios here. Note that I’m not discussing what happens if you drag a window between two different monitors each plugged into a separate card; instead, I’m discussing the specific hardware which allows multiple graphics cards to display to the same monitor.<br />
<br />
<h2>
OpenGL on macOS</h2>
<br />
On macOS, you can tell which GPU your OpenGL context is running on by running glGetString(GL_VENDOR). When you create your context, you declare whether or not you are capable of using the low-power GPU (the high-power GPU is the default). macOS has the design where if any context requires the high-power GPU, the whole system is flipped to use it. This is observable by using <a href="https://gfx.io/">gfxCardStatus</a>. This means that the whole system may switch out from under you while your app is running because of something a completely different app did.<br />
<br />
For many apps, this isn’t a problem because macOS will copy your OpenGL resources between the GPUs, which means your app may be able to continue without caring that the switch occurred. This works because the OpenGL context itself survives the switch, but the internal renderer changes. Because the context is still alive, your app can likely continue.<br />
<br />
The problem, though, is with OpenGL extensions. Different renderers support different extensions, and app logic may depend on the presence of an extension. On my machine, the high-powered GPU supports both GL_EXT_depth_bounds_test and GL_EXT_texture_mirror_clamp, but the low-powered one doesn’t. Therefore, if an app relies on an extension, and the renderer changes in the middle of operation, the app may malfunction. The way to fix this is to listen to the NSWindowDidChangeScreenNotification in the default NSNotificationCenter. When you receive this notification, re-interrogate the OpenGL context for its supported extensions. Note that switching in both directions may occur - the system switches to the high-power GPU when some other app is launched, and the system switches back when that app is quit.<br />
<br />
You only have to do this if you opt-in to running on the low-power GPU, because if you don’t opt in, you will run on the high-power GPU, which means your app will be the app keeping the system on the high-power GPU, which means the system will never switch back while your app is alive.<br />
<br />
<h2>
Metal on macOS</h2>
<br />
Metal takes a different approach. When you want to create a MTLDevice, you must choose which GPU your device reflects. There is an API call, MTLCopyAllDevices(), which will simply return a list, and you are free to interrogate each device in the list to determine which one you want to run on. In addition, there’s a MTLCreateSystemDefaultDevice() which will simply pick one for you. On my machine, this “default device” isn’t magical - it is simply exactly equal (by pointer equality) to one of the items in the list that MTLCopyAllDevices() returns. On my machine, it returns the high-powered GPU.<br />
<br />
However, MTLDevices don’t have the concept of an internal renderer. In fact, even if you cause the system to change the active GPU (using the above approach of making another app create an OpenGL context), your MTLDevice still refers to the same device that it did when you created it.<br />
<br />
I was suspicious of this, so I ran a performance test. I created a shader which got 28 fps on the high-powered GPU and 11 fps on the low-powered one. While this program was running on the low-powered GPU, I opened up an OpenGL app which I knew would cause the system to switch to the high-powered GPU, and I saw that the app’s fps didn’t change. Therefore, the Metal device doesn’t migrate to a new GPU when the system switches GPUs.<br />
<br />
Another interesting thing I noticed during this experiment was that the Metal app was responsive throughout the entire test. This means that the rendering was being performed on the low-power GPU, but the results were being shown on the high-power GPU. I can only guess that this means that the visual results of the rendering are being copied between GPUs every frame. This would also seem to mean that both GPUs were on at the same time, which seems like it would be bad for battery life.<br />
<br />
<h2>
DirectX 12 on Windows 10</h2>
<br />
I recently bought a Microsoft Surface Book which has the same kind of setup: one low-power GPU and one high-power GPU. Similarly to Metal, when you create a DirectX 12 context, you have to select which adapter you want to use. IDXGIFactory4::EnumAdapters1() returns a list of adapters, and you are free to interrogate them and choose which one you prefer. However, there is no separate API call to get the default adapter; there is simply a convention that the first device in the list is the one you should be using, and that it is the low-power GPU.<br />
<br />
As I stated above, on macOS, switching to the discrete GPU is all-or-nothing - the screen’s signal is either coming from the high-power GPU or the low-power GPU. I don’t know whether or not this is true on Windows 10 because I don’t know of a way to observe it there.<br />
<br />
However, an individual DirectX 12 context won’t migrate between GPUs on Windows 10. This is observable with a similar test as the one described above. Automatic migration occurred on previous versions of Windows, but it doesn’t occur now.<br />
<br />
Therefore, the model here is similar to Metal on macOS, so it seems like the visual results of rendering are copied between the two cards, and that both cards are kept on at the same time if there are any contexts executing on the high-power GPU.<br />
<br />
However, the Surface Book has an interesting design: the high-power GPU is in the bottom part of the laptop, near the keyboard, and the laptop’s upper (screen) half can separate from the lower half. This means that the high-power GPU can be removed from the system.<br />
<br />
Before the machine’s two parts can be separated, the user must press a special button on the keyboard which is more than just a physical switch. It causes software to run which inspects all the contexts on the machine to determine if any app is using the high-powered GPU on the bottom half of the machine. If it is being used by any app, the machine refuses to separate from the base (and shows a pop up asking the user to please quit the app, or presumably just destroy the DirectX context). There is currently no way for the app to react to the button being pressed so that it could destroy its context. Instead, currently, the user must quit the app.<br />
<br />
However, it is possible to lose your DirectX context in other ways. For example, if a user connects to your machine via Terminal Services (similar to VNC), the system will switch from a GPU-accelerated environment to a software-rendering environment. To an app, this will look like the call to IDXGISwapChain3::Present() will return DXGI_ERROR_DEVICE_REMOVED or DXGI_ERROR_DEVICE_RESET. Apps should react to this by destroying their device and re-querying the system for the present devices. This sort of thing will also happen when Windows Update updates GPU drivers or when some older Windows versions (before Windows 10) perform a global low-power to high-power (or vice-versa) switch. So, a well-formed app should already be handling the DEVICE_REMOVED error. Unfortunately, this doesn’t help the use case of separating the two pieces of the Surface Book.<br />
<br />
Thanks to Frank Olivier for lots of help with this post.Litherumhttp://www.blogger.com/profile/12738405376090442005noreply@blogger.com1tag:blogger.com,1999:blog-8778351438463999796.post-58399153318770656732016-09-30T21:08:00.003-07:002016-09-30T21:15:05.512-07:00Variation Fonts Demo<style>
@keyframes gainWeight {
from {
font-variation-settings: "wght" 0.5;
}
to {
font-variation-settings: "wght" 3;
}
}
@keyframes gainWidth {
from {
font-variation-settings: "wdth" 0.7;
}
to {
font-variation-settings: "wdth" 1.2;
}
}
@keyframes gainBoth {
from {
font-variation-settings: "wdth" 0.7, "wght" 0.5;
}
to {
font-variation-settings: "wdth" 1.2, "wght" 3;
}
}
</style>
Try opening this in a recent <a href="https://webkit.org/downloads/">Safari nightly build</a>.<br />
<br/>
The first line shows the text with no variations.<br />
The second line animates the weight.<br />
The third line animations the width.<br />
The fourth line animates both.<br />
<br />
<div style="font: 48px "skia"; text-decoration: underline; transform-origin: left top;">
<div>
hamburgefonstiv</div>
<div style="animation-name: 'gainWeight'; animation-direction: alternate; animation-duration: 3s; animation-iteration-count: infinite; animation-timing-function: ease-in-out;">
hamburgefonstiv</div>
<div style="animation-name: 'gainWidth'; animation-direction: alternate; animation-duration: 3s; animation-iteration-count: infinite; animation-timing-function: ease-in-out;">
hamburgefonstiv</div>
<div style="animation-name: 'gainBoth'; animation-direction: alternate; animation-duration: 3s; animation-iteration-count: infinite; animation-timing-function: ease-in-out;">
hamburgefonstiv</div>
</div>Litherumhttp://www.blogger.com/profile/12738405376090442005noreply@blogger.com10tag:blogger.com,1999:blog-8778351438463999796.post-9435566325731953592016-09-22T16:46:00.000-07:002016-09-23T10:24:18.577-07:00Variable Fonts in CSS DraftRecently, the CSS Working Group in the W3C resolved to pursue adding support for variable fonts within CSS. A draft has been added to the <a href="https://github.com/w3c/csswg-drafts/blob/master/css-fonts-4/Overview.bs">CSS Fonts Level 4 spec</a>. Your questions and comments are extremely appreciated, and will help shape the future of variation fonts support in CSS! Please add them to either a <a href="https://github.com/w3c/csswg-drafts/issues/new">new CSS GitHub issue</a>, tweet at <a href="https://twitter.com/Litherum">@Litherum</a>, email to <a href="mailto:mmaxfield@apple.com">mmaxfield@apple.com</a>, or use any other means to get in contact with anyone at the CSSWG! Thank you very much!<br />
<br />
Here is what CSS would look like using the current draft:<br />
<br />
<style>
code {
display: block;
font-size: 0.9em;
background: rgb(40, 40, 40);
border-left: 10px solid rgb(100, 100, 100);
padding: 10px;
}
</style>
1. Use a preinstalled font with a semibold weight:<br />
<br />
<code><div style="font-weight: 632;">hamburgefonstiv</div></code><br />
<br />
2. Use a preinstalled font with a semicondensed weight:<br />
<br />
<code><div style='font-stretch: 83.7%;'>hamburgefonstiv</div></code><br />
<br />
3. Use the "ital" axis to enable italics<br />
<br />
<code>// Note: No change! The browser can enable variation italics automatically.<br />
<div style="font-style: italic;">hamburgefonstiv</div></code><br />
<br />
4. Set the "fancy" axis to 9001:<br />
<br />
<code><div style="<br />font-variation-settings: 'fncy' 9001;">hamgurgefonstiv</div></code><br />
<br />
5. Animate the weight and width axes together:<br />
<br />
<code>@keyframes zooming {<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>from {<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>font-variation-settings: 'wght' 400, 'wdth' 85;<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>}<br />
<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>to {<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>font-variation-settings: 'wght' 800, 'wdth' 105;<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>}<br />
}<br />
<br />
<div style="animation-duration: 3s;<br />animation-name: zooming;">hamburgefonstiv</div></code><br />
<br />
6. Use a variation font as a web font (without fallback):<br />
<br />
<code>@font-face {<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>// Note that this is identical to what you currently do today!<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>font-family: "VariationFont";<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>src: url("VariationFont.otf");<br />
}<br />
<br />
<div style="font-family: 'VariationFont';"> hamburgefonstiv</div></code><br />
<br />
7. Use a variation font as a web font (with fallback):<br />
<br />
<code>@font-face {<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>font-family: 'FancyFont';<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>src: url("FancyFont.otf") format("opentype-variations"), url("FancyFont-600.otf") format("opentype");<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>font-weight: 600;<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>// Old browsers would fail to parse "615",<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>// so it would be ignored and 600 remains.<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>// New browsers would parse it correctly so 615 would win.<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>// Note that, because of the font selection<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>// rules, the font-weight descriptor above may<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>// be sufficient thereby making the font-weight<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>// descriptor below unnecessary.<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>font-weight: 615;<br />
}<br />
<br />
#fancy {<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>font-family: "FancyFont";<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>font-weight: 600;<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>font-weight: 615;<br />
}<br />
<br />
<div id="fancy">hamburgefonstiv</div></code><br />
<br />
8. Use two variations of the same variation font<br />
<br />
<code>@font-face {<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>font-family: "VariationFont";<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>src: url("VariationFont.otf");<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>font-weight: 400;<br />
}<br />
<br />
<div style="font-family: VariationFont; font-weight: 300;">hamburgefonstiv</div><br />
<br />
<div style="font-family: VariationFont; font-weight: 700;">hamburgefonstiv</div></code><br />
<br />
9. Combine two variation fonts together as if they were a single font: one for weights 1-300 and another for weights 301-999:<br />
<br />
<code>@font-face {<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>font-family: "SegmentedVariationFont";<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>src: url("SegmentedVariationFont-LightWeights.otf");<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>font-weight: 1;<br />
}<br />
<br />
@font-face {<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>// There is complication here due to the peculiar nature of the font selection rules.<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>// Note how this block uses the same source file as the block below.<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>font-family: "SegmentedVariationFont";<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>src: url("SegmentedVariationFont-HeavyWeights.otf");<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>font-weight: 301;<br />
}<br />
<br />
@font-face {<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>font-family: "SegmentedVariationFont";<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>src: url("SegmentedVariationFont-HeavyWeights.otf");<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>font-weight: 999;<br />
}</code>Litherumhttp://www.blogger.com/profile/12738405376090442005noreply@blogger.com0tag:blogger.com,1999:blog-8778351438463999796.post-26825558264023338132016-09-15T10:03:00.000-07:002016-09-15T10:03:08.059-07:00Font Taxonomy<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhrkCfOcHcyFcck4MPMyRqjVu8nWqKHd1av430gfEFWCwPImFmujBkiG2Flt6ZtWMpq8pv4trRrpMS9BBOWcrgCghhwQv2GGmTC7eaKKOK9dTDYdhTHecdo1-_gwe-1GGpuZLAkyd7DCSsY/s1600/Screen+Shot+2016-09-15+at+10.02.39+AM.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="282" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhrkCfOcHcyFcck4MPMyRqjVu8nWqKHd1av430gfEFWCwPImFmujBkiG2Flt6ZtWMpq8pv4trRrpMS9BBOWcrgCghhwQv2GGmTC7eaKKOK9dTDYdhTHecdo1-_gwe-1GGpuZLAkyd7DCSsY/s320/Screen+Shot+2016-09-15+at+10.02.39+AM.png" width="320" /></a></div>
<br />Litherumhttp://www.blogger.com/profile/12738405376090442005noreply@blogger.com0tag:blogger.com,1999:blog-8778351438463999796.post-56268595211702195242016-09-03T14:16:00.004-07:002016-09-03T14:16:38.974-07:00OpenGL on iOSThe model of OpenGL on iOS is much simpler than that on macOS. In particular, the context creation routine on macOS is older than the concept of OpenGL frame buffers, which is why it is structured the way that it is. Back then, the model was much simpler: the OS gave you a buffer, and you drew stuff into it. If you wanted to render offscreen, you had to ask the OS to give you an offscreen buffer.<br />
<br />
That all changed with frame buffer objects. Now, in OpenGL, you can create your own offscreen render targets, render into them, and when you’re done, read from them (either as a texture or into host memory). This means that there is a conceptual divide between that buffer the OS gives you when you create your context, and the frame buffer objects you have created in your own OpenGL code.<br />
<br />
On iOS, the OpenGL infrastructure was created with frame buffer objects in mind. Instead of asking the OS to give you a buffer to render into, you instead, ask the OS to assign a backing store to a render buffer (which is part of a framebuffer). Specifically, you do this after the OpenGL context is created. This means that almost all of those creation parameters are now unnecessary, since most of them define the structure of that buffer the OS gives you. Indeed, on iOS, when you create a context, the only thing you specify is which version of OpenGL ES you want to use.<br />
<br />
On iOS, the way you render directly to the screen is with CoreAnimation layers. There is a method on EAGLContext, renderbufferStorage:fromDrawable: which connects an EAGLDrawable with a renderbuffer. Currently, CAEAGLLayer is the only class which implements EAGLDrawable, which means you have to draw into a layer in the CoreAnimation layer tree. (You can also draw into an offscreen IOSurface by wrapping a texture around it and using render-to-texture, as detailed in my previous post).<br />
<br />
This model is quite different from CAOpenGLLayer, as used on macOS. Here, you can affect the properties of the drawable by setting the drawableProperties property on the EAGLDrawable.<br />
<br />
There is a higher-level abstraction: a GLKView, which subclasses UIView. This class has a GLKViewDelegate which provides the drawing operations. It has properties which let you specify the attributes of the drawable. There’s also the associate GLKViewController which subclasses UIViewController, which has its own GLKViewControllerDelegate. This delegate has an update() method, which is called between frames. The idea is that you shouldn’t need to subclass GLKView or GLKViewController, but you should subclass the delegates.<br />
<br />
Many iOS devices have retina screens. The programmer has to opt-in to high density screens by setting the contentsScale property of the CAEAGLLayer to whatever UIScreen.nativeScale is set to. If you don’t do this, your view will be stretched and blurry. This also means that you have to take care to update any places where you interact with pixel data directly, like glReadPixels().<br />
<br />
iOS devices also support multiple monitors via AirPlay. With AirPlay, an app can render content on to a remote display. However, the model for this is a little different than on macOS: instead of the user dragging a window to another monitor, and the system telling the app about it, the app handles the movement to the external monitor. The system will give you a UIScreenDidConnectNotification / UIScreenDidDisconnectNotification when the user enables AirPlay. Then, you can see that the [UIScreen screens] array has multiple items in it. You can then move a view hierarchy to the external screen by assigning the screen to your UIWindow’s screen property. You can create a new UIWindow by using the regular alloc / initWithFrame constructor and passing in the UIScreen’s bounds. You then set the rootViewController of this new window to whatever you want to show on the external monitor. Therefore, when this occurs, you have the freedom to query the properties of the remote screen (using UIScreen APIs, such as UIScreen.nativeScale) and react accordingly. For example, if you have a retina device but you are moving content to a 1x screen, you can know this by querying the screen at the time you move the window to it.<br />
<br />
On macOS, an OpenGL context could have many renderers inside it, with only one being active at a current time. On iOS devices, there is only one GPU, which means there is only one renderer. This means you don’t have to worry about a switch in renderers. This means that the model is much simpler and you don’t have to worry so much about things changing out from under you.Litherumhttp://www.blogger.com/profile/12738405376090442005noreply@blogger.com0tag:blogger.com,1999:blog-8778351438463999796.post-67597987926041563712016-08-22T01:56:00.005-07:002016-08-22T01:57:33.018-07:00OpenGL on macOSOpenGL is a specification created by a cross-vendor group, and is designed to work on all (fairly modern) graphics cards. While this sounds obvious, it actually has some interesting implications. It means that nothing platform-specific is inside the OpenGL spec itself. Instead, only the common pieces are inside the spec.<br />
<br />
In addition, technically, OpenGL is not a piece of software. OpenGL is a document designed for humans to read. There are many libraries written by many people which claim to implement this spec, but it’s important to realize that these libraries are not OpenGL itself. There can be problems with an individual implementation, and there can be problems with the spec, and those are separate problems.<br />
<br />
OpenGL operates inside a “context” which is “current” to a thread. However, the spec doesn’t include any way of interacting with this context directly (like creating it or making it current). This is because each platform has their own way of creating this context. On macOS, this is done with the CGL (Core OpenGL) framework.<br />
Another example of something not existing in the spec is the issue of device memory availability. The OpenGL spec does not list any way to ask the device how much memory is available or used on the device. This is because GPUs can be implemented with many different regions of memory with different performance characteristics. For example, many GPUs have a separate area where constant memory or texture memory lives. On the other hand, an integrated GPU uses main memory, which is shared with regular applications, so the whole concept of available graphics memory doesn’t make a lot of sense. (Also, imagine a theoretical GPU with automatic memory compression.) Indeed, these varied memory architectures are incredibly valuable, and GPU vendors should be able to innovate in this space. If being able to ask for available memory limits were added to the spec, it would either 1) be simple but meaningless on many GPUs with varied memory architectures, or 2) be so generic and nebulous that it would be impossible for a program to make any actionable decisions at runtime. The lack of such an API is actually a success, not an oversight. If you are running on a specific GPU whose memory architecture you understand, perhaps the vendor of that GPU can give you a vendor-specific API to answer these kinds of question in a platform-specific way. However, this API would only work on that specific GPU.<br />
<br />
Another example is the idea of “losing” a context. Most operating systems include mechanisms which will cause your OpenGL context to become invalid, or “lost.” Each operating system has its own affordances for why a context may be lost, or how to listen for events which may cause the context to be lost. Similar to context creation, this concept falls squarely in the “platform-dependent” bucket. Therefore, the spec itself just assumes your context is valid, and it is the programmer’s responsibility to make sure that’s true on any specific operating system.<br />
<br />
As mentioned above, OpenGL contexts on macOS are interacted with directly by using CGL (in addition to its higher-level NSOpenGL* wrappers). There are a few concepts involved with using CGL:<br />
<ul>
<li>Pixel Formats</li>
<li>Renderers</li>
<li>Virtual Screens</li>
<li>Contexts</li>
</ul>
<div>
<br /></div>
A context is the thing you need to run OpenGL functions. In order to create a context, you need to specify a pixel format. This is a configuration of the external resources the context will be able to access. For example, you can say things like “Make a double-buffered color buffer 8 bits-per-channel, with a similar 8-bit depth buffer.” This information needs to be specified on the context itself (and is therefore not in the OpenGL spec because it’s platform-specific) because there is a relationship between what you specify here and the integration with the rest of the machine. For example, you can only successfully create a context with a pixel format that the window server understands, because at the end of the day, the window server needs to composite the output of your OpenGL rendering with the rest of the windows on the system. (This is also the reason why there’s no “present” call in the OpenGL spec - it requires interaction with the platform-specific window server.)<br />
<br />
Because the pixel format attributes also act as configuration parameters to the renderer in general, this is also the place where you specify things like which version of OpenGL the context should support (which is necessary because OpenGL deprecated some things) and increasingly moves things from ARB extensions into core. Parameters like this one don’t affect the format of the pixels, per se, but they do affect the selection of the CGL renderer used to implement the OpenGL functions.<br />
<br />
A CGL renderer is conceptually similar to a vtable which backs the OpenGL drawing commands. There is a software renderer, as well as a renderer provided by the GPU driver. On a MacBook Pro with both an integrated and discrete GPU, different renderers are used for each one. A renderer can operate on one or more virtual screens, which are conceptually similar to physical screens attached to the machine, but generalized (virtualized) so it is possible to, for example, have a virtual screen that spans across two physical screens. There is a relationship between CGDisplayIDs and OpenGL virtual screens, so it’s possible to map back and forth between them. This means that you can get semantic knowledge of an OpenGL renderer based on existing context in your program. It’s possible to iterate through all the renderers on the system (and their relationships with virtual screens) and then use CGL to query attributes about each renderer.<br />
<br />
A CGL context has a set of renderers that it may use for rendering. (This set can have more than one object in it.) The context may decide to migrate from one renderer to another. When this happens, the context the application uses doesn’t change; instead if you query the context for its current renderer, it will just reply with a different answer.<br />
<br />
(Side note: it’s possible to create an OpenGL context where you specify exactly one renderer to use with kCGLPFARendererID. If you do this, the renderer won’t change; however, the virtual screen can change if, for example, the user drags the window to a second monitor attached to the same video card.)<br />
<br />
Therefore, this causes something of a problem. Inside a single context, the system may decide to switch you to a different renderer, but different renderers have different capabilities. Therefore, if you were relying on the specific capabilities of the current renderer, you may have to change your program logic if the renderer changes. Similarly, even if the renderer doesn’t change, but the virtual screen does change, your program may also need to alter its logic if it was relying on specific traits of the screen. Luckily, if the renderer changes, then the virtual screen will also change (even on a MacBook pro with integrated & discrete GPU switching).<br />
<br />
On macOS, the only supported way to show something on the screen is to use Cocoa (NSWindow / NSView, etc.). Therefore, using NSOpenGLView with NSOpenGLContext is a natural fit. The best part of NSOpenGLView is that it provides an “update” method which you can override in a subclass. Cocoa will call this update method any time the view’s format changes. For example, if you drag a window from a 1x screen to a 2x screen, Cocoa will call your “update” method, because you need to be aware that the format changed. Inside the “update” function, you’re supposed to investigate the current state of the world (including the current renderer / format / virtual screen, etc.), figure out what changed, and react accordingly.<br />
<br />
This means that using the “update” method on NSOpenGLView is how you support Hi-DPI screens. You also should opt-in to Hi-DPI support using wantsBestResolutionOpenGLSurface. If you don’t do this and you’re using a 2x display, your OpenGL content will be rendered at 1x and then stretched across the relevant portion of the 2x display. You can convert between these logical coordinates and the 2x pixel coordinates by using the convert*ToBacking methods on NSView. By default, this stretching happens so calls like glReadPixels() will still work in the default case even without mapping coordinates to their backing equivalent. (Therefore, if you want to support 2x screens, all your calls which interact with pixels directly, like glReadPixels(), will need to be updated.)<br />
<br />
Similarly, NSOpenGLView has a property which supports wide-gamut color: wantsExtendedDynamicRangeOpenGLSurface. There is an explanatory comment next to this property which describes how normally colors are clipped in the 0.0 - 1.0 range, but if you set this boolean, the maximum clipping value may increase to something larger than 1.0 depending on which monitor you’re using. You can query this by asking the NSScreen for its maximumExtendedDynamicRangeColorComponentValue. Similar to before, the update method should be called whenever anything relevant here changes, thereby giving you an opportunity to investigate what changed and react accordingly.<br />
<br />
However, if you increase the color gamut (think: boundary threshold color) your numbers are supposed to span, it means that one of two things will happen:<br />
<ul>
<li>You keep the same number of representable values as before, but spread each representable value farther from its neighbors (so that the same number of representable values spans the larger space)</li>
<li>You add more representable values to keep the density of representable values the same (or higher!) than before.</li>
</ul>
<br />
The first option sucks because the distance of adjacent representable values are fairly close to the minimum perception threshold in our eyes. Therefore, if you increase the distance between adjacent representable values, these “adjacent” colors actually start looking fairly distinct to us humans. The effect becomes obvious if you look at what should be a smooth gradient, because you see bands of solid color instead of the smooth transition.<br />
<br />
The second option sucks because more representable values means more information, which means your numbers have to be held in more bits. More bits means more memory is required.<br />
<br />
Usually, the best solution is to pay for the additional memory (either by repurposing the alpha channel bits to be used as the color channel, and going to a 10-bit/10-bit/10-bit/2-bit pixel format, which means you use the same amount of memory, but give up alpha fidelity), or by going to a half float (16-bit) pixel format, which means your memory use doubles (since each channel before was 8-bit and now you’re going to 16-bit). Therefore, if you want to use wide color, you probably want deep color, which means you should be specifying an appropriate deep-color pixel format attribute when you create your OpenGL context. You probably want to specify NSOpenGLPFAColorFloat as well as NSOpenGLPFAColorSize 64. Note that, if you don’t use a floating point pixel format (meaning: you use a regular integral pixel format), you do get additional fidelity, but might not be able to represent values outside of the 0.0 - 1.0 range, depending on how the mapping of the integral units maps to the color space (which I don’t know).<br />
<br />
There’s one other interesting piece of interesting tech released in the past few years - A MacBook Pro with two GPUs (one integrated and one discrete) will switch between them based on which apps are running and which contexts have been created across the entire system. This switch occurs for all apps, which means that one app can cause the screen to change for all the existing apps. As mentioned before, this means that the renderer inside your OpenGL context could change at an arbitrary time, which means a well-behaved app should listen for these changes and respond accordingly. However, not all existing apps do this, which means that the switching behavior is entirely opt-in. This means that if any app is running which doesn’t understand this switching behavior, the system will simply pick a GPU (the discrete one) and force the entire system to use it until the app closes (or, if more than one naive app is running, until they all close). Therefore, no switches will occur when these apps are running, and the apps can run in peace. However, keeping the discrete GPU running for a long time is a battery drain, so it’s valuable to teach your apps how to react correctly to a GPU switch.<br />
<br />
Unfortunately, I’ve found that Cocoa doesn’t call NSOpenGLView’s “update” method when one of these GPU switches occurs. The switch is modeled in OpenGL as a change of the virtual screen of the OpenGL context. You can listen for a virtual screen change in two possible ways:<br />
<ul>
<li>Add an observer to the default NSNotificationCenter to listen for the NSWindowDidChangeScreenNotification</li>
<li>Use CGDisplayRegisterReconfigurationCallback</li>
</ul>
<br />
If you’re rendering to the screen, then using NSNotificationCenter should be okay because you’re using Cocoa anyway (because the only way to render to the screen is by using Cocoa). There’s no way to associate a CGL context directly with an NSView without going through NSOpenGLContext. If you’re not rendering to the screen, then presumably you wouldn’t care which GPU is outputting to the screen.<br />
<br />
Inside these callbacks, you can simply read the currentVirtualScreen property on NSOpenGLView (or use CGLGetVirtualScreen() - Cocoa will automatically call the setter when necessary). Once you’ve detected a virtual screen change, you should probably re-render your scene because the contents of your view will be stale.<br />
<br />
After you’ve implemented support for switching GPUs, you then have to tell the system that the support exists, so that it won’t take the legacy approach of choosing one GPU for the lifetime of your app. You can do this either by setting NSSupportsAutomaticGraphicsSwitching = YES in your Info.plist inside your app’s bundle, or, if you’re using CGL, you can use the kCGLPFASupportsAutomaticGraphicsSwitching pixel format attribute when you create the context. Luckily, CGLPixelFormatObj and NSOpenGLPixelFormat can be freely converted between (likewise with CGLContextObj and NSOpenGLContext).<br />
<br />
Now that you’ve told the system you know how to switch GPUs, the system won’t force us to use the discrete GPU. However, if you naively create an OpenGL context, you will still use the discrete GPU by default. It means, however, you now have the ability to specify that you would prefer the integrated GPU. You do this by specifying that you would like an “offline” renderer (NSOpenGLPFAAllowOfflineRenderers).<br />
<br />
So far, I’ve discussed how we go about rendering into an NSView. However, there are a few other rendering destinations that we can render into.<br />
<br />
The first is: no rendering destination. This is considered an “offscreen” context. You can create one of these contexts by never setting the context’s view (which NSOpenGLView does for you). One way to do this is to simply create the context with CGL, and then never touch NSOpenGLView.<br />
<br />
Why would you want to do this? Because OpenGL commands you run inside an offscreen context still execute. You can use your newly constructed context to create a framebuffer object, and render to an OpenGL renderbuffer. Then, you can read the results out of the render buffer with glReadPixels(). If your goal is rendering a 3D scene, but aren’t interested in outputting it on a screen, this is the way to do it.<br />
<br />
Another destination is a CoreAnimation layer. In order to do this, you would use a CAOpenGLLayer or NSOpenGLLayer. The layer owns and creates the OpenGL context and pixel format; however, it does this with input from you. The idea is that you would subclass CAOpenGLLayer/NSOpenGLLayer and override the copyCGLPixelFormatForDisplayMask: method (and/or the copyCGLContextForPixelFormat: method). When CoreAnimation wants to create its context, it will call these methods. By supplying the pixel format method, you can specify that, for example, you want an OpenGL version 4 context rather than a version 2 context. Then, when CoreAnimation wants you to render, it will call a draw method which you should override in your subclass and perform any drawing you prefer. By default, it will only ask you to draw in response to setNeedsDisplay, but you can set the “asynchronous” flag to ask CoreAnimation to continually ask you to draw.<br />
<br />
Another destination is an IOSurface. An IOSurface is a buffer which can live in graphics memory which can represent a 2D image. The interesting part of an IOSurface is that it can be shared across process boundaries. If you do that, you have to implement synchronization yourself between the multiple processes. It’s possible to wrap an OpenGL texture around an IOSurface, which means you can render to an IOSurface with render-to-texture. If you create a framebuffer object, create a texture from the IOSurface using CGLTexImageIOSurface2D(), bind the texture to the framebuffer, then render into the framebuffer, the result is that you render into the IOSurface. You can share a handle to the IOSurface by using IOSurfaceCreateXPCObject(). Then, if you manage synchronization yourself, you can have another process read from the IOSurface by locking it with IOSurfaceLock() and getting the pointer to the mapped data with IOSurfaceGetBaseAddressOfPlane(). Alternately, you can set it as the “contents” of an CoreAnimation layer. Or, you could use it in another OpenGL context in the other process.Litherumhttp://www.blogger.com/profile/12738405376090442005noreply@blogger.com1tag:blogger.com,1999:blog-8778351438463999796.post-38094288640285401122016-04-09T19:17:00.001-07:002016-11-15T10:02:17.518-08:00GPU Text Rendering OverviewThere are a few different ways to render text using the GPU. I’ll be discussing a few ways here. All these different ways represent general strategies - they can be mixed and matched. Also, this list may not be comprehensive, but I’ll only discuss the approaches that I’m familiar with. Also, note that I’m only interested in computing coverage here - not color.<br />
<br />
First: a little background on text. Glyphs are just sequences of bezier paths (I’m ignoring “sbix” glyphs and things like that). I’m only interested in TrueType / OpenType fonts, so the form of the glyphs are given in the ‘glyf’ table or the ‘CFF ‘ table. The ‘glyf’ table only describes quadratic bezier curves, while the ‘CFF ‘ table can describe cubic bezier curves. At first, this may sound like a small difference, but it turns out that math involving cubic bezier curves is way more complicated than math involving quadratic bezier curves. (For example: finding the intersections of two cubic bezier curves involves finding roots of a 9th order polynomial - something which humanity is currently unable to compute in closed form.) Also, the winding-order is different for the two formats: the ‘glyf’ table encodes paths with a non-zero winding-order, while the ‘CFF ‘ table encodes paths with an even/odd winding-order. Subpaths are expected to intersect. This means that you can’t assume that the right-hand-side of a contour is always inside the glyph.<br />
<br />
<h2>
Texture Atlas</h2>
<br />
The first approach is kind of a hack. The text is still “rendered” on the CPU (the same way it has been done since the 70s), but the final image is uploaded into a texture atlas on the GPU. This approach actually makes the first use of a glyph slower (because of the additional upload step); however, subsequent uses of that glyph are much faster.<br />
<br />
If a subsequent use of a glyph is slightly different in position (within a pixel) or size, there are a couple things you can do. If the position is different (one usage has its origin on the left edge of a pixel, and another usage has its origin in the middle), you might be able to get away with simply relying on the GPU’s texture filtering hardware to interpolate the result. If that isn’t good enough for you, you could snap the glyph origins to a few quanta within a pixel, and consider glyphs which differ in this snapped origin to be unique. This approach works similarly for varying glyph sizes - you can either rely on the texture filtering hardware to scale the glyph, or you could snap to a size quanta. (Or both!)<br />
<br />
<h2>
Signed Distance Field</h2>
<br />
Valve <a href="http://www.valvesoftware.com/publications/2007/SIGGRAPH2007_AlphaTestedMagnification.pdf">published</a> a similar approach which they use in the Team Fortress 2 game. Recall how in the previous approach, the value of each texel is coverage of that texel. Valve’s approach uses the same idea, except that the value of each texel is a “signed distance field.” This means that the value of each texel is a signed distance of closest approach to the boundary of the curve (signed because “inside” values are negative). Using this approach causes bilinear filtering to provide higher-quality results (or, put another way, you can achieve comparable results with fewer texels).<br />
<br />
A newer approach using signed distance fields has been <a href="https://github.com/Chlumsky/msdfgen">implemented</a> which uses additional color channels in the GPU texture to achieve higher-fidelity results.<br />
<br />
<h2>
Generated Geometry</h2>
<br />
The next approach is to generate geometry which matches the contours closely. GPUs can only render triangle geometry, so this means that this approach requires triangulating the input curve. One way to do this is to choose a constant triangle size. However, a better idea is to increase the triangle density in areas of high complexity, and to decrease the triangle density in areas of low complexity.<br />
<br />
This means you want to subdivide the bezier curves finely where the curve is sharp, and loosely when the curve is loose. Luckily, the <a href="https://en.wikipedia.org/wiki/De_Casteljau%27s_algorithm">DeCasteljau</a> method already does this! If you subdivide a Bezier curve with equal intervals using that method, the subdivision points will be closer together where the curve is sharp, and vice-versa.<br />
<br />
Once you’ve done the subdivision, you can use a “<a href="https://en.wikipedia.org/wiki/Constrained_Delaunay_triangulation">Constrained Delaunay Triangulation</a>” to actually run the triangulation. This is similar to a regular Delaunay triangulation, except that it can guarantee that particular edges are present in the triangulation. This means you can guarantee that no triangle will cross a contour. Therefore, each triangle can be considered to be entirely inside or entirely outside the glyph, and can be shaded accordingly.<br />
<br />
<h2>
Stencil Buffer Approach</h2>
<br />
If you don't want to run that triangulation, you can use the GPU’s stencil buffer (or equivalent) to calculate coverage instead. The idea is that you use the subdivision points to model the contours as a sequence of line segments. Then, you pick a point (let’s call it P) way off somewhere (it can be arbitrary), and, for every line segment, form a triangle with that line segment and that point P. When you do that, you’ll have lots of overlapping triangles.<br />
<br />
You can then set up the stencil buffer to say “increment the texel’s counter if the triangle you’re shading has positive area, and decrement the texel’s counter if the triangle you’re shading has negative area” (where “negative area” and “positive area” refer to shading the “front” or “back” or the triangle, and is determined by if the points are submitted in a clockwise or counter-clockwise direction). If you shade all the triangles like this, all the overlapping triangles cancel out, and you're left with nonzero counters in all the places where the glyph lies. You can then set up the stencil buffer to say “only output a value if the stencil buffer has a nonzero value.” Note that this only works for font files which use a nonzero winding order.<br />
<br />
This approach has the obvious performance/speed tradeoff of the subdivision density. The higher the density, the slower the rendering but the better-looking the results are. Also, you want the subdivision density to be proportional (somewhat) to the font size, since you want the subdivision density to be roughly equal in screen-space for all glyph rendered. Unfortunately, it’s difficult to use this approach to get high-quality rendering without super tiny triangles.<br />
<br />
<h2>
Loop-Blinn Method</h2>
<br />
In the first method, glyph coverage information was represented by a texture. In the third method, glyph coverage information was represented by geometry. Another method (called the <a href="http://http.developer.nvidia.com/GPUGems3/gpugems3_ch25.html">Loop Blinn method</a>) can represent glyph coverage information by using mathematical formulas. This method tries to represent a particular contour in a way that can be computed inside a fragment shader.<br />
<br />
In particular, in order to draw a contour, you draw a single triangle which encompasses the entire contour. Inside the triangle, you define a scalar field where each point inside the triangle has a scalar value associated with it. You can create this scalar field in such a way that the following attributes hold:<br />
<br />
<ul>
<li>Scalar values which are negative are “inside” the contour, and scalar values which are positive are “outside” the contour</li>
<li>Calculating the value of a scalar value in the field can be done in closed form by only knowing the relevant point’s location within the triangle, in addition to some interpolated information associated with the vertices of the triangle</li>
</ul>
<br />
This means that, given some vertex attributes, you can run some math in a pixel shader which will tell you if the shaded point is inside or outside the contour. So, for each contour, you consider a single triangle which includes the contour, and you then calculate some magic values to associate with each vertex of the triangle. Then, you shade that triangle with a fairly simple pixel shader which computes a closed-form equation to determine the coverage.<br />
<br />
Note that the formulas involved with the Loop-Blinn method are much simpler for quadratic Bezier curves than for cubic Bezier curves. However, the general approach still works for cubic curves - the difference is that the formulas are bigger (and you need to perform an additional classification step).<br />
<br />
Also note that this approach still can use the Constrained Delaunay Triangulation, because you still need to generate triangles that lie entirely within the bounds of the glyph. However, there is no need to do the heuristic subdivision like in the previous method; instead, all the triangles of the mesh are created from the control points of the contours themselves.<br />
<br />
This means that the quality of the curves is defined by mathematical formulas, which means that it is effectively infinitely scalable. In fact, the information in the glyph contours can be losslessly converted to the information used for the Loop-Blinn method.<br />
<br />
Overall, these methods are not monolithic things, and can be used in conjunction with one another. For example, you could use the stencil buffer approach to shade the insides of the glyph, but the Loop-Blinn method to shade the contours themselves (so that you don’t have to do any subdivision). These algorithms represent general approaches, and should be used as the basis for further thought (rather than simply coding them up wholesale).<br />
<br />
Antialiasing with each of these methods is a pretty interesting discussion, but I’ll save that for another post.Litherumhttp://www.blogger.com/profile/12738405376090442005noreply@blogger.com0tag:blogger.com,1999:blog-8778351438463999796.post-69833554333608026432016-03-24T17:58:00.000-07:002016-03-24T17:58:11.970-07:00CSS Box ModelClick for higher resolution.
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEifAw-JZjG197E_H-Hlj392MLtYM7D8_ij24L2cP00J-L5A5I7Wsl2Qd6fV5SRCIaOcRAh4zP87Z8bgNaR885O5b4ryWZiS4xyd6Mtm5pAYzPUDzZ3pxxAHRkVHlqIGrMF67EEhIHXhl28Q/s1600/IMG_0001.jpg" imageanchor="1"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEifAw-JZjG197E_H-Hlj392MLtYM7D8_ij24L2cP00J-L5A5I7Wsl2Qd6fV5SRCIaOcRAh4zP87Z8bgNaR885O5b4ryWZiS4xyd6Mtm5pAYzPUDzZ3pxxAHRkVHlqIGrMF67EEhIHXhl28Q/s320/IMG_0001.jpg" style="height: 100%; width: 100%;" /></a></div>
<br />Litherumhttp://www.blogger.com/profile/12738405376090442005noreply@blogger.com0tag:blogger.com,1999:blog-8778351438463999796.post-69985363458891326392016-02-03T00:36:00.003-08:002016-02-03T00:37:05.454-08:00Color BlendingColor blending is something which is done all the time. Your computer is doing it right now. Literally. However, it requires a little bit of thought to get it right.<br />
<br />
There are four pieces involved when blending:<br />
<ul>
<li>Source color</li>
<li>Coverage information (alpha)</li>
<li>Destination color</li>
<li>A working color space</li>
</ul>
<br />
We can all agree on what coverage is. You simply model each sample as a square, or rectangle, or circle, and decide on how much of that area is covered by the foreground. Therefore, it is fractional and unitless (because it’s a ratio). It can never be greater than 1 or less than 0. It is associated with the foreground because it represents the geometry in the foreground. You may have more than one value per sample (for example, if you are interested in each sub-pixel individually).<br />
<br />
The working color space is the color space that our blending computations are performed in. This color space must be a linear color space. This means that, if you have a number which represents once channel of a color, and you double it, its distance from zero must also exactly double.<br />
<br />
sRGB is not a linear color space. However, we can come up with a conceptual “linearized sRGB” color space which uses the same primaries as sRGB and same 0-point and 1-point, but uses linear interpolation between 0 and 1. Converting from sRGB into this new color space is simply raising each value to the 2.2 power. (The conversion is actually a little more complicated than that - it uses a peace-wise function - but we’re only discussing the conceptual model here.)<br />
<br />
So, the first step is to convert all our colors into this working color space. Then, each blending operation is performed by taking a weighted average of the color primaries’ values, using the coverage information as a weight. Successive blending operations are performed back-to-front.<br />
<br />
The formula for this weighted average is:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">Source Primary * Source Alpha + Destination Primary * (1 - Source Alpha)</span><br />
<br />
You can see that if the alpha is 0, the result is equal to the destination, and if the alpha is 1, the result is equal to the source. This is simply a linear interpolation between the two.<br />
<br />
Now, it turns out that the requirement of rendering items from back-to-front is greatly constraining. It means that if we have a whole bunch of items to blend together, we can’t precompute the result of blending certain items together, and then blend those with a background. This is because the formula above is not associative.<br />
<br />
However, a very similar formula is associative:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">Source Primary + Destination Primary * (1 - Source Alpha)</span><br />
<br />
The only difference between this formula and the original is the replacement of “Source Primary * Source Alpha” with “Source Primary.”<br />
<br />
Well, let’s come up with a new concept, called a “premultiplied color.” This is the same thing as a regular color, except the values in the primaries’ channels have already been multiplied by the alpha of the color. This is possible because the color primaries’ values and the alpha channel have the same lifetime, so we can perform the multiplication at the time when this object is created.<br />
<br />
Well, we can see that if we use these objects in the associative formula, we get the same answer as before (because the new “Source Primary” is equal to the old “Source Primary” times the Source Alpha; this multiplication is performed inside the “Premultiplied Color” object). However, we get the benefit of using an associative formula.<br />
<br />
Therefore, with premultiplied colors, you can blend in any order. It’s worth noting that using premultiplied colors is not a requirement - if you blend out-of-order with premultiplied colors, you will get the exact same result as if you had blended back-to-front with non-premultiplied colors. This also means that you can start blending out-of-order, but if you notice that you have blended all the deepest items, you can transition to non-premultiplied blending halfway through all your blending operations. The answer will be the same.<br />
<br />
By contrast, using a linear working color space is a hard requirement. If you don’t do this, your math will yield values which are meaningless. Once you’re done with all your blending, you usually want to write out the output in a well-known colorspace (such as sRGB), which means you usually have to un-linearize the result just before output.<br />
<br />
Because of this, linearization / unlinearization should be the first and last steps. Premultiplication and unpremultiplication should be the second and second-to-last steps (if they are used at all). Premultiplication is optional and you can even unpremultiply halfway through your calculations if some conditions are met.<br />
<br />
Note that linearizing / unlinearizing sRGB can have some pretty dramatic results. For example, if you blend pure black and pure white (technically “sRGB black” and “sRGB white”) with 50% alpha, you end up with your (resulting sRGB) primaries having values of 74%, nowhere near the 50% you would get if you performed the same calculation (incorrectly) in the non-linear sRGB space.Litherumhttp://www.blogger.com/profile/12738405376090442005noreply@blogger.com0