My first crack at the problem involved blindly assuming that Haskell was creating a gigantic thunk for the entire computation, and then executing the thunk. The logic was that a gigantic thunk would take up a gigantic amount of memory. Forcing Haskell to evaluate this thunk early would get rid of the thunk and would solve all my problems. So, as dictated by Real World Haskell, I went about using the seq function to force Haskell to evaluate my thunks as they were being created.
This didn’t actually turn out to solve my problem. The memory signature didn’t go down at all. After re-reading the relevant chapter in Real World Haskell, I found a paragraph that I had overlooked before. This paragraph stated that the seq command only evaluates down to the first constructor. This means that the seq command will evaluate 3 + 4 and get 7, but will not evaluate elements in a list. This is because the (:) operator is technically a constructor (The list type () is defined as a : List | Empty). Therefore, calling seq on a list will not actually do anything – it will evaluate down to the first constructor, which is the first (:) in the list itself. Therfore, the strategy of using the seq command won’t work at all.
Then I took a step back. I had assumed that the problem was caused by a giant thunk; what if it wasn’t? I had no evidence to support either claim. Wouldn’t it be nice if Haskell had a profiling tool to tell me exactly where the memory leak is? I tweeted about the apparent lack of a profiling tool, and was responded to very quickly by a twitter-er named donsbot. He pointed me to the relevant chapter in Real World Haskell on profiling. As it turns out, Haskell has some very advanced tools for system profiling!
Later, I did some searching on donsbot, and I found that it referred to someone who goes by the handle dons, or Don S. A little bit later in the search, I found that this guy is a co-author of my very own Haskell book, Real World Haskell! I felt incredibly geeked out by learning this information – I blindly tweeted and the co-author of the book I’ve been using responded! I immediately started following him on twitter and that has led me to check out some of the really interesting work he’s doing, currently on the Vector libraries. He, too, is on Stack Overflow, and has some very thoughtful and inspiring answers on that site.
Anyway, after running some of these profiling tools, I found that almost all my memory was being put into lists of doubles. After looking through my code, I immediately found the main problem.
A Perlin Noise generator works by splitting the image into multiple levels. Each level is furthermore split into a grid of blocks, with keypoints pre-generated at the corners of each block. In my previous implementation, I was iterating through each level, generating each pixel in that level. I was then sorting the pixels to put them in the same order for each level, and using zip to combine all the pixels for a specific position into the final value. The problem was in the sorting step. In order to sort a list, the entire list has to be memory resident. Therefore, all the pixels in all the levels had to be memory resident.
The best way I came up with to solve this problem is to think of Haskell’s execution model as a streaming framework. If I wrote the program correctly, Haskell will be able to generate the value in a specific pixel, write it out to the image, have the garbage collector collect that pixel, and move on to the next pixel. This means that each pixel has to be independent of each other pixel. The way I could get around sorting the pixels is to always generate the pixels in the same order.
For example, in the old way, I would generate the pixels block by block. This would mean that, for two different levels with two different block structures, the pixels would be generated in these orders:
For a blocksize of 2x2, the pixels would be generated like this:
For a blocksize of 4x4, the pixels would be generated like this:
I had to alter this so that the pixels would be generated the same way, regardless of level. If I did this, Haskell could generate the same pixel in each level, apply the operator I used in zip, get the final pixel, and write it out to the image, without ever having to generate any more pixels. This turns the entire program into what can be thought of as a stream – it streams each pixel, one by one.
Modifying the program to do this was fairly straightforward. Loop through the X values, if you come across a new block, generate a new Hermite vector, else, use the existing Hermite vector. If you come to the end of a line, go to the next line. If the next line is in a new block, update the current line and block keypoints. Then, generate the Hermite vector for this new block. This function is even tail-recursive.
The new program certainly has a smaller memory footprint. It is still not quite where I would like to see it, however. The current memory graph shows that it doesn’t quite run in linear space. If I had programmed it correctly, it should run in linear space, so clearly I still have at least one unanticipated bug. It appears that a fair amount of memory is being held in the list data structure itself, which means that it is being held in the 'next' pointer for linked lists. Perhaps I could alleviate this problem by using Vectors?
Also, I bit the bullet and made the damn thing use LibGD directly.
Here is the current memory graph for the new program:
My code can be found here.