I got to thinking about this and discovered that the bounding box hardly changes between a particular pixel and the next pixel. Because the loop iterates over pixels that are next to each other, I could keep track of the bounding box between calls, and now only need to check to see if the bounding box changes. However, once an entire row of pixels has been generated, the sequence of bounding boxes repeats.
It would be nice to think about the problem backwards and, instead of finding the bounding box that a particular pixel is in, find all the pixels that are in a specific bounding box. That way, I could iterate over the bounding boxes, and just sort the result at the end (to re-order the values into line-by-line order). Iterating over the bounding boxes is easy in two dimensions – split the list of keypoints into a matrix of keypoints, and iterate across the y dimension 2 at a time, and inside that, iterate across the x dimension 2 at a time. If the matrix is implemented as a list of lists, the iteration over y can be done by iterating over the main list, and the iteration over x can be done by iterating over the lists inside the main list. Because this is done iteratively, it can be done quickly. Iterating over the pixels in a block is similarly easy.
Then, I got to thinking about the bicubic interpolation performed at each step. It occurred to me that I was finding a dot product between a vector of coefficients and a Hermite vector. Looking at the code to generate the Hermite vector, I realized that this code only depends on the values of the local bounding box. That is to say, all the values within a single block share a Hermite vector. An exploitation of this fact was easy to calculate given the new structure of the program – a new Hermite vector was calculated inside the iteration over local blocks, and this vector was passed to the interpolation routine for each pixel in the block.
I also noticed that the output images were looking quite blocky, as you would expect from filling in a matrix of blocks of values. I realized, though, that one way to somewhat alleviate this problem would be to actually calculate derivatives. Previously, I was assuming that the derivatives at each keypoint were 0. This means that if successive keypoints have values of 0, 3, 6, 9, and 12, the output would look blocky because I was pretending the derivatives at the keypoints were all 0. This would lead to an output that was flat at each of the keypoints, and grew quickly between keypoints.
Calculating derivatives was a bit tricky. Three derivatives are necessary for each point – the partial x derivative, the partial y derivative, and the cross derivative. Early on, I decided that the cross derivative would be too difficult to determine, so I left that one 0. Thus, all that was left were the two partial derivatives. The partial x derivative for a point P should be calculated as rise/run for the two points immediately adjacent to P in the x dimension. However, if P is the left-most point in the row, P should be used instead of a point immediately to the left of P (because such a point does not exist). The opposite is true for the right-most point. I created a tricky function to do this.
Finding all the partial x derivatives involves turning the list of keypoints into a matrix, then mapping the iteration across each row of the matrix. Finding the partial y derivatives involves exactly the same thing, except done with the transpose of the matrix. Zipping everything back together is a bit tricky, but it isn’t too bad to make sure everything is done correctly.
The resulting image is still fairly blocky, but it appears to be a bit better. According to my calculations, introducing the optimizations increases the speed of the program by more than 1000x. This is more than I ever hoped.
However, there is a problem. Generating an 800x600 image with a depth of 4 levels of keypoints takes 10 gigabytes of memory (Good thing I have 12 in my PC!). I am not quite sure what all that memory is for, but I believe that either:
- Haskell is building a gigantic thunk, and then evaluating it, or
- Because the program is not tail recursive, it is holding an intermediate level of 480,000 pixel values at each 480,000 levels of recursion (one for each pixel in 800x600)