Saturday, October 28, 2023

ReSTIR Part 2: Characterizing Sample Reuse

After enumerating all the building blocks of ReSTIR, there isn't actually that much more. The rendering equation is an integral, and our job is to approximate the value of the integral by sampling it in the most intelligent way possible.

Importance sampling tells us that we want to generate samples with a density that's proportional to the contribution of those samples to the value of the final integral. (So, where the light is strongest, sample that with highest density.) We can't directly produce samples with this probability density function, though - if we could, we could just compute the integral directly rather than dealing with all this sampling business

The function being integrated in the rendering equation is the product of a few independent functions:

  • The BRDF (BSDF) function, which is a property of the material we are rendering,
  • The distribution of incoming light. For direct illumination, this is distributed over the relevant light sources
  • A geometry term, where the orientation of the surface(s) affects the result
  • A visibility term (the point being shaded might be in shadow)
The fact that there are a bunch of independent terms means that Multiple Importance Sampling (MIS) works well - we can use these independent functions to produce a single aggregated "target" function which we expect will approximate the real function fairly well. So, we can generate samples according to the target function, using Sequential Importance Resampling (SIR), evaluate the real function at those sampling locations (by tracing rays or whatever), then use Resampled Importance Sampling (RIS) to calculate an integral. Easy peasy, right?
 

ReSTIR


This is where ReSTIR starts. The first observation that ReSTIR makes is that it's possible to use reservoir sampling (RS) to turn this into a streaming algorithm. The paper assumes that the reservoir only holds a single sample (though this isn't actually necessary). The contents of the reservoir represent a set of (one) sample with pdf proportional to the target function, and the more samples the reservoir encounters, the better that pdf matches the target function. The name of the game, now, is to make the reservoir encounter as many samples as possible.

Which brings us to the second observation that ReSTIR makes. Imagine if there was some way of merging reservoirs in constant time (or rather: in time proportional to the size of the reservoirs, rather than time proportional to the number of samples the reservoirs have encountered). If this were possible, you could imagine a classic parallel reduction algorithm: each thread (pixel) could start out with a naive reservoir (a poor approximation of your target function), but then adjacent threads could merge their reservoirs, then one-of-every-4-threads could merge their reservoirs, then one-of-every-8, etc, until you have a single result that incorporates results from all the threads. If only a single level (generation) of this reduction occurs each frame, you end up with a result where you perform a constant amount of work each frame per thread, but the result is that an exponential number of samples end up being accumulated. This is the key insight that ReSTIR makes.
 

Merging Reservoirs


Merging reservoirs is a subtle business, though. The problem is that different pixels/threads are shading different materials oriented at different orientations. In effect, your target function you're sampling (and the real function you're evaluating) are different from pixel to pixel. If you ignore this fact, and pretend that all your pixels are all sampling the same thing, you can naively just jam the reservoirs together, by creating a new reservoir which encounters the values saved in the reservoirs of the inputs. This is fast, but gets wrong results (called "bias" in the literature).

What you have to do instead is to treat the merging operation with care. The key here lies with the concept of "supports." Essentially, if you're trying to sample a function, you have to be able to generate samples at every place the function is nonzero. If there's an area where the function is nonzero but you never sample that area, your answer will turn out wrong. Well, the samples that one pixel generates (recall that the things in the reservoirs are sample locations) might end up not being applicable to a different pixel. For example, consider if there's an occlusion edge where one pixel is in shadow and a nearby pixel isn't. Or, another example: the surface normal varies across the object, and the sample at one pixel is at a sharp angle, such that if you use that same sample at a different pixel, that sample actually points behind the object. You have to account for this in the formulas involved.
 

Jacobian Determinant


There's a generalization of this, which uses the concept of a Jacobian determinant. Recall that, in general, a function describes a relationship between inputs and outputs. The Jacobian determinant of a function describes, for a particular point in the input space of the function, if you make a small perturbation and feed a slightly different input point into the function, how much the output of the function will be perturbed. It's kind of a measure of sensitivity - at a particular point, how sensitive are changes in the output to changes in the input.

Well, if you have a sample at one particular pixel of an image, and you then apply it to a different pixel, you have an input (the sample at the original pixel) and you have an output (the sample at the destination pixel) and you have a relationship between the two (the probability of that sample won't be exactly the same at the two different places). So, the Jacobian tells you how to compensate for the fact that you're changing the domain of the sample.

In order to incorporate the Jacobian, you have to be able to calculate it (of course), which means you have to be able to characterize how sample reuse across pixels affects the probabilities involved. For direct illumination, that's just assumed to be 1 or 0 depending on the value of the sample point - hence why above you just ignore some samples altogether when reusing them. For indirect illumination (path tracing), a sample is an entire path, and when you re-use it at a different pixel, you're producing a new path that is slightly different than the original path. This path manipulation is called "shift mapping" of a path in the gradient domain rendering literature, and common shift mappings have well-defined Jacobian functions associated with them. So, if you spatially reuse a path, you can pick a "shift mapping" for how to define the new path, and then include that shift mapping's Jacobian in the reservoir merging formula.

This concept of a "shift mapping" and its Jacobian can be generalized to any kind of sampling - it's not just for path tracing.

Conclusion


So that's kind of it. If you're careful about it, you can merge reservoirs in closed form (or, at least, in closed form for each sample in the reservoirs), which results in a pdf of the values in the reservoir that are informed by the union of samples of all the input reservoirs. This leads to a computation tree of merges which allows the number of samples to be aggregated exponentially over time, where each frame only has to do constant work per pixel. You can perform this reuse both spatially and temporally, if you remember information about the previous frame. The more samples you aggregate, the closer the pdf of the samples in the reservoir matches the target function, and the target function is formed by using MIS to approximate the rendering equation. This allows you to sample with a density very close to the final function you're integrating, which has the effect of reducing variance (noise) in the output image.

ReSTIR also has some practical concerns, such as all the reuse causing an echo chamber of old data - the authors deal with that by weighting old and new data differently, to try to strike a balance between reuse (high sample counts) vs quickly adhering to new changes in geometry or whatever. It's a tunable parameter.

No comments:

Post a Comment