Saturday, December 19, 2015
Sunday, December 13, 2015
Methods of writing Windows apps
Win32
In the beginning, there was just Win32. This is a C-based API which has been around since Windows 95. In this API, UI widgets (such as buttons) were known as “Windows” (represented in code as HWND) and windows could have windows inside of them. There was no concept of automatic layout; instead, each window had to be explicitly placed and sized by the programmer.
When you have a window, you can create a “device context” which is associated with that window. This device context is what we now would call a 2D drawing context, and it has API functions which let you draw lines and circles and text and that kind of stuff. All these drawing operations were performed on the CPU by a library named “GDI.”
Interactivity was fairly primitive. Applications would create their own runloop. Each turn of the runloop calls GetMessage(), which corresponds with an event that occurred. However, because some messages need to be handled by the system, the runloop just turns around and delivers the event back to the system. Before doing this, you register a callback function which the system will invoke when it wants to. This is where you are told things like “you should paint your contents now” or “the user pressed a key on the keyboard.” Because each message has an enum type, this callback function will switch over the type, and react to all the messages it understands.
This API is 20 years old, and should not be used for new projects. However, it is still supported today on current Windows operating systems.
Microsoft Foundation Classes (MFC) was the next suggested method of writing Windows apps. This is a C++ wrapper around the Win32 API; it was a new way of doing what people were already doing. Similar to Win32, it shouldn't be used anymore.
.NET
.NET represents a new approach to Windows desktop software development. There is a new language, a new object model, a new ABI, garbage collection, etc. There are two ways that .NET applications could be used to make graphical applications: WinForms and WPF. WinForms simply internally uses the Win32 API, and therefore isn’t very interesting.
WPF, however, represents a clean break from Win32 GUI programming. This API doesn’t use GDI for drawing, doesn’t use HWNDs, and doesn't require manual layout. Instead, development starts with a declarative language called XAML. This is an XML-based language where regions of the screen are represented as elements in XAML. Each element has an associated .NET class (in the System.Windows namespace) which allows you to do everything in code that the XAML file does. These elements aren’t HWNDs; instead, they are their own unrelated type.
Because HWNDs aren’t involved, drawing isn’t done with GDI. Instead, it is done entire on top of Direct3D, which means that the drawing routines are hardware accelerated. It also means that WPF controls look different from their GDI counterparts; a WPF app is recognizable.
WinRT
WinRT is the suggested method of creating a GUI application. WinRT’s GUI model is set up similarly to WPF, except that it has an entirely new set of elements (which reside in the Windows.UI namespace). It still employs XAML for representing elements in a declarative form.
For drawing a specific control, WinRT uses Direct2D, a hardware-accelerated drawing library. This library is conceptually on top of Direct3D, but it is also available for arbitrary developer use.
Once an individual element is drawn, multiple elements are composited together using Windows Composition. This library allows for creating a retained-mode hardware-accelerated tree of visual objects. This API is internally used to composite all the controls in the window. It even has a public API so you can interact with the composition tree directly.
As opposed to Win32, WinRT has an application model. You don’t have to write your own event loop with WinRT. Applications are implemented as a subclass of Windows::ApplicationModel::Core::IFrameworkView. This class has 5 functions which the application overrides in order to react to lifecycle events. The event loop is implemented in CoreApplication::Run, which takes an IFrameworkViewSource, which is a factory for IFrameworkViews.
Event handling is performed explicitly. Class definitions in C++/CX have properties, methods, and events. If you want to react to an event, you simply construct an Windows::Foundation::EventHandler object, and use the += operator to attach it to the object’s event field. The EventHandler object is a wrapper around a function pointer or a lambda. As an example, UIElement has a “PointerMoved” event.
Layout is achieved by encapsulating different layout algorithms inside different controls. For example, there is a Windows::UI::XAML::Controls::Grid control, which arranges its contents in a grid. Different controls arrange their contents according to their own internal algorithms. The most basic example is the Canvas class, which allows the programmer to explicitly lay out its child controls (so you’re not limited by the classes that come with the library).
Inside these callbacks, you can modify any part of the XAML tree you want in order to change the rendering of your app. From code, the way you identify elements in the XAML tree is pretty interesting. Each XAML file is represented in native code by a class which extends Windows::UI::Xaml::Controls::Page. When you build your project, the XAML file is used to create an auto generated header. Nodes in the XAML file can have a “name” attribute, which represents the name that native code would refer to the node as. The auto generated header contains each of these variables of the right type and name. Then, the auto generated header’s content is mixed into your Page subclass by using the support for “partial” classes. This allows the class declaration to be defined in two files; the conceptual class is the union of the two classes. Therefore, these auto generated variables automatically become class members, and are therefore in scope for all the methods of the class. In the end, this means that you don’t have to do any typing; you just immediately have access to your nodes by name.
Once you have access to the elements, you can change any of their properties, run any of their methods, and listen for any of their events.
Saturday, December 12, 2015
Binary Compatibility of Windows Programs
COM
Around 1993, there was an issue around binary compatibility of object-oriented programs. The problem is that callers and callees must agree on the internal layout of structs in memory. This means, however, that if a callee is updated (as libraries often are) and changes this memory layout, the program will not behave as expected.
Microsoft came up with a solution to this problem by creating an “object model” (named “COM”) which objects that traverse library boundaries should adhere to. The object model has a few rules:
- Objects must not have any data that is exposed. Therefore, the only thing publicly available from an object is functions to call on the object.
- The object’s functions are represented in memory as an array of pointers to functions. The order of this array matches the order that the functions are declared in source code.
You may recognize this design as an “interface.” The function array is a vtable. Therefore, mapping this idea to C and C++ is very natural. Because the only things that exist in the objects is a vtable, there are no problems with padding or alignment of data members or anything like that. Inheritance is implemented by simply adding more function pointers on to the end of the array.
This memory layout also means that many languages can interact with these COM objects.
There are two more interesting pieces to this object model: there are 3 functions that every object must implement. This can be thought of as the root of the inheritance hierarchy. In C++, this base class is known as IUnknown. The three functions are:
- AddRef()
- QueryInterface()
- Release()
QueryInterface() is pretty interesting. It is a way of converting one object into another type. The idea is that all COM types have an associated ID number. When you call QueryInterface(), you pass in an identifier of a type that you want. If the object knows how to represent itself as that type, it returns an interface of the correct type. Therefore, it’s sort of like a dynamic_cast, except that the source and destination types don’t have to be related at all. This is actually a really powerful concept, because it provides a mechanism for interoperability between APIs. A new library can be released, and its objects might want to support being used as the objects in another library; this can naturally be achieved by supporting QueryInterface().
In every COM library I’ve used, functions use out params instead of return values. Instead, they return an HRESULT error code. Therefore, it’s common to see every call to a COM object wrapped in a block which checks the return code.
.NET
Around 10 years later, Java was becoming popular. One of Java’s goals was to allow programs to run on any machine without requiring recompilation. It achieved this goal by compiling to a “virtual machine” bytecode. Then, at runtime, an interpreter would run the bytecode.
This approach means that all memory accesses and function calls operate through an interpreter, so the interpreter can make sure that nothing bad happens (for example: a program can never segfault). It also means that the virtual machine is free to implement any object model it wants; all object accesses pass through it, so it is free to, for example, move objects around in memory whenever it wants. This naturally leads to a garbage-collected language, where the garbage collector is implemented in the virtual machine.
(Now, I’d like to make an aside about this “managed code” business. It isn’t really true that all Java programs are interpreted; the virtual machine will JIT the program and point the program counter into the JITted code. Whenever the interpreter needs to do some work, it just makes the generated bytecode call back into its own code. Therefore, nothing is being done here that couldn’t be done with a simple software library. Indeed; this managed “runtime” is the same fundamental idea as the raw C “runtime”; it’s just that many operations which would be raw instructions in C are instead implemented as function calls into the runtime. If you are diligent about all your object accesses, you can make your C++ program garbage collected. “Managed code” doesn’t really mean anything mechanical; instead, it is a way of thinking about the runtime model of your program.)
Microsoft wanted to be able to use this model of “managed code” in their own apps, so they invented a system quite similar to Java’s. You would compile your apps to a “CLI” form, which is a platform-agnostic spec describing a virtual machine. Then, the Microsoft CLR interprets and runs your program. A new language (C#) was invented which nicely maps all its concepts to CLI concepts.
There are a couple extra benefits of having a well-specified intermediary form. One is that there can be multiple languages which compile to your CLI. Indeed, Microsoft has many languages which can be compiled to the CLI, such as F# and VB.NET. Another benefit is that the standard library can exist in the runtime, which means that all of those languages can call into the standard library for free. Each language doesn’t need its own standard library.
A pretty interesting benefit of this virtual machine is that it solves the binary compatibility problem. The fundamental problem with binary compatibility is that one library might reach inside an object vended by another library; with a virtual machine, that “reaching” must go through the runtime. The runtime can then make sure that everything will work correctly. So, we now have another solution for the binary compatibility problem.
C++/CLI
Remember how I mentioned earlier how that this concept of “runtime” is really just a library of functions that you should call when you want to do things (like call a function or access a property of an object)? Well, C++ supports linking with libraries… there is no reason why one shouldn’t be able to call into these runtime libraries from a C++ program.
The difficulty, however, is being diligent with your accesses. Every time you do anything with one of these “managed” objects, you must call through the runtime. This is a classic case of where the language can help you; you could imagine augmenting C++ to be able to naturally represent the concept of a managed object, and enforcing all interactions with such objects to be done through the runtime.
This is exactly what C++/CLI is. It is a superset of the C++ language, and it understands what managed objects are. These managed objects are fundamentally different from C++ objects; they are represented using different syntax. For example, when you create a managed object, you use the keyword “gcnew” instead of “new”, in order to remind you that these objects are garbage collected. A reference to a managed object uses the ^ character instead of the * character. Creating a reference to a managed object is done with the % character instead of the & character. These new types are a supplement to the language; all the regular C++ stuff still exists. This way, the compiler forces you to interact with the CLI runtime in the correct way.
Windows Runtime
The currently accepted way to write Windows programs is using the Windows Runtime (WinRT). You may think that WinRT is a runtime just like the .NET runtime discussed above; however, this isn’t quite right. WinRT objects are integrated with the .NET runtime, and are considered “managed” in the same way that .NET objects are considered “managed.” However, they aren’t binary compatible with .NET objects; you can’t interact with WinRT objects using the same functions you use to interact with .NET objects.
Therefore, C++/CLI doesn’t work for WinRT objects. Instead, a similarly-looking language, C++/CX, was invented to interact with these WinRT objects. It keeps some of the syntax from C++/CLI (such as ^ and %) but replaces others (such as “ref new” instead of “gcnew”).