Wednesday, November 16, 2016

Single Screen GPU Handoff

Over the past few years, a collection of laptops have been released with two graphics cards. The idea is that one is low-power and one is high power. When you want long battery life, you can use the low-power GPU, but when you want high performance, you can use the high-power GPU. However, there is a wrinkle: the laptop only has one screen.

The screen’s contents have to come from somewhere. One way to implement this system would be to daisy-chain the two GPUs, thereby keeping the screen always plugged into the same GPU. In this system, the primary GPU (which the screen is plugged into) would have to be told to give the results of the secondary GPU to the screen.

A different approach is to connect both GPUs in parallel with a switch between them. The system will decide when to flip the switch between each of the GPUs. When the screen is connected to one GPU, the other GPU can be turned off completely.

The question, then, is how this looks to a user application. I’ll be investigating three different scenarios here. Note that I’m not discussing what happens if you drag a window between two different monitors each plugged into a separate card; instead, I’m discussing the specific hardware which allows multiple graphics cards to display to the same monitor.

OpenGL on macOS

On macOS, you can tell which GPU your OpenGL context is running on by running glGetString(GL_VENDOR). When you create your context, you declare whether or not you are capable of using the low-power GPU (the high-power GPU is the default). macOS has the design where if any context requires the high-power GPU, the whole system is flipped to use it. This is observable by using gfxCardStatus. This means that the whole system may switch out from under you while your app is running because of something a completely different app did.

For many apps, this isn’t a problem because macOS will copy your OpenGL resources between the GPUs, which means your app may be able to continue without caring that the switch occurred. This works because the OpenGL context itself survives the switch, but the internal renderer changes. Because the context is still alive, your app can likely continue.

The problem, though, is with OpenGL extensions. Different renderers support different extensions, and app logic may depend on the presence of an extension. On my machine, the high-powered GPU supports both GL_EXT_depth_bounds_test and GL_EXT_texture_mirror_clamp, but the low-powered one doesn’t. Therefore, if an app relies on an extension, and the renderer changes in the middle of operation, the app may malfunction. The way to fix this is to listen to the NSWindowDidChangeScreenNotification in the default NSNotificationCenter. When you receive this notification, re-interrogate the OpenGL context for its supported extensions. Note that switching in both directions may occur - the system switches to the high-power GPU when some other app is launched, and the system switches back when that app is quit.

You only have to do this if you opt-in to running on the low-power GPU, because if you don’t opt in, you will run on the high-power GPU, which means your app will be the app keeping the system on the high-power GPU, which means the system will never switch back while your app is alive.

Metal on macOS

Metal takes a different approach. When you want to create a MTLDevice, you must choose which GPU your device reflects. There is an API call, MTLCopyAllDevices(), which will simply return a list, and you are free to interrogate each device in the list to determine which one you want to run on. In addition, there’s a MTLCreateSystemDefaultDevice() which will simply pick one for you. On my machine, this “default device” isn’t magical - it is simply exactly equal (by pointer equality) to one of the items in the list that MTLCopyAllDevices() returns. On my machine, it returns the high-powered GPU.

However, MTLDevices don’t have the concept of an internal renderer. In fact, even if you cause the system to change the active GPU (using the above approach of making another app create an OpenGL context), your MTLDevice still refers to the same device that it did when you created it.

I was suspicious of this, so I ran a performance test. I created a shader which got 28 fps on the high-powered GPU and 11 fps on the low-powered one. While this program was running on the low-powered GPU, I opened up an OpenGL app which I knew would cause the system to switch to the high-powered GPU, and I saw that the app’s fps didn’t change. Therefore, the Metal device doesn’t migrate to a new GPU when the system switches GPUs.

Another interesting thing I noticed during this experiment was that the Metal app was responsive throughout the entire test. This means that the rendering was being performed on the low-power GPU, but the results were being shown on the high-power GPU. I can only guess that this means that the visual results of the rendering are being copied between GPUs every frame. This would also seem to mean that both GPUs were on at the same time, which seems like it would be bad for battery life.

DirectX 12 on Windows 10

I recently bought a Microsoft Surface Book which has the same kind of setup: one low-power GPU and one high-power GPU. Similarly to Metal, when you create a DirectX 12 context, you have to select which adapter you want to use. IDXGIFactory4::EnumAdapters1() returns a list of adapters, and you are free to interrogate them and choose which one you prefer. However, there is no separate API call to get the default adapter; there is simply a convention that the first device in the list is the one you should be using, and that it is the low-power GPU.

As I stated above, on macOS, switching to the discrete GPU is all-or-nothing - the screen’s signal is either coming from the high-power GPU or the low-power GPU.  I don’t know whether or not this is true on Windows 10 because I don’t know of a way to observe it there.

However, an individual DirectX 12 context won’t migrate between GPUs on Windows 10. This is observable with a similar test as the one described above. Automatic migration occurred on previous versions of Windows, but it doesn’t occur now.

Therefore, the model here is similar to Metal on macOS, so it seems like the visual results of rendering are copied between the two cards, and that both cards are kept on at the same time if there are any contexts executing on the high-power GPU.

However, the Surface Book has an interesting design: the high-power GPU is in the bottom part of the laptop, near the keyboard, and the laptop’s upper (screen) half can separate from the lower half. This means that the high-power GPU can be removed from the system.

Before the machine’s two parts can be separated, the user must press a special button on the keyboard which is more than just a physical switch. It causes software to run which inspects all the contexts on the machine to determine if any app is using the high-powered GPU on the bottom half of the machine. If it is being used by any app, the machine refuses to separate from the base (and shows a pop up asking the user to please quit the app, or presumably just destroy the DirectX context). There is currently no way for the app to react to the button being pressed so that it could destroy its context. Instead, currently, the user must quit the app.

However, it is possible to lose your DirectX context in other ways. For example, if a user connects to your machine via Terminal Services (similar to VNC), the system will switch from a GPU-accelerated environment to a software-rendering environment. To an app, this will look like the call to IDXGISwapChain3::Present() will return DXGI_ERROR_DEVICE_REMOVED or DXGI_ERROR_DEVICE_RESET. Apps should react to this by destroying their device and re-querying the system for the present devices. This sort of thing will also happen when Windows Update updates GPU drivers or when some older Windows versions (before Windows 10) perform a global low-power to high-power (or vice-versa) switch. So, a well-formed app should already be handling the DEVICE_REMOVED error. Unfortunately, this doesn’t help the use case of separating the two pieces of the Surface Book.

Thanks to Frank Olivier for lots of help with this post.

1 comment:

  1. There is a way to specify the enumeration order of adapters on Surface Book. It's the same as NV Optimus trick. If you define:

    extern "C"
    __declspec(dllexport) DWORD NvOptimusEnablement = 0x00000001;

    in your exe module, the EnumAdapter will returns NV GPU first, then the Intel's.

    Also, the only new thing to D3D 11.4 is an interface of device remove notification. You can set a event handle instead of watching DXGI_ERROR_DEVICE_REMOVED all the time. But it's wired that Surface Book doesn't support that. I don't know if it's because Surface Book' driver is WDDM 2.0, not of 2.1.