Saturday, April 27, 2024

Avoiding Seemingly-Necessary Retain Cycles

I was recently implementing an API that seemed like it required a retain cycle. Object A has a property of object B, and object B has a property of object A. Customer code is allowed to retain either object A or object B and must be able to access the other one via the property. It seems like this requires a retain cycle to implement, doesn't it!

Specifically, I'm implementing reflection for a programming language compiler. The customer gives a program to the system as a string, and the system compiles their program, but also returns a reflection object that describes the contents of their program. The reflection object consists of a collection of subobjects, each of which represents a part of the customer's program.

So, if the customer's program is something like:

struct Foo {

    Foo* link;

}

There will be a subobject which represents the struct, and there will be a subobject under that which represents the field. The field has a subobject which represents the type of the field, and that object has an accessor which references the inner type of the pointer - which is the Foo struct again. That's the retain cycle.

There is a solution, though, which doesn't actually involve retain cycles. It's something I learned from the WebKit team during my time working on that project. The solution is that you take all the objects in the cycle, and instead of having each object have its own reference count, you make a single reference count for the entire collection. If any customer code wants to reference any subobject, the subobjects are set up to automatically forward it on to the entire collection's single reference count. If you know, then, that every object in the collection has the same reference count and therefore the same lifetime, then all links within the collection (from one subobject to another subobject) can be raw pointers (rather than strong references that would increment the reference count). Indeed, you actually *can't* have a strong reference from one subobject to another, because that would actually mean that the collection has a strong reference to itself.

So, let's break it down.

Step 1: Identify a single owning object under which all of the subobjects live. In my case, that's the top-level "Reflection" object. This object's reference count will represent the reference count for the entire collection of subobjects under it.

Step 2: For each subobject under the reflection object, give it a raw pointer to the owning parent object. This can be a raw pointer, because we are guaranteeing that the lifetime of all the subobjects is equal to the lifetime of the single owning object. We can also have each subobject notify the owning object that it owns the subobject. This will be important later, when the owning object eventually dies, and wants to *actually* destroy each subobject.

Step 3: For each subobject, override its retain and release method to call the owning object's retain and release method instead. This means that any time any customer wants to retain or release a subobject, they end up retaining and releasing the single owning object instead. In this way, the owning object's lifetime is the union of the lifetimes of each of the subobjects.

Step 4: We have to modify any references from any subobject to any other subobject to be a raw pointer, instead of a strong reference. At this point, if there is any strong reference from a subobject to another subobject, that strong reference will be forwarded to the parent object, and the parent object would stay alive forever. So we have to avoid this, and use raw pointers for all links between subobjects.

Step 5: We have to modify the destructor of the parent object to destroy each of the subobjects under it. There are 2 pieces to this. The first piece is that the parent object needs a vector of raw pointers to each of the subobjects. We can build up this vector when the subobject notifies the parent object back in step 2. Also, the vector needs to hold raw pointers (as opposed to strong references) for the same reason that subobjects need to hold raw pointers among themselves - if we don't, the owning object will live forever. The second piece is that there has to be some way for the parent to *actually* destroy a subobject. It can't just call release on the subobject, because that will end up being forwarded back to the owning object. Therefore, all of the subobjects need a new method, "actually release," which actually releases the subobject without forwarding it to the owning object. The destructor of the owning object calls this on each subobject which was registered to it.

Step 6: There's one last gotcha: When subobjects are created, the customer code is going to think they are +1 objects, and release them eventually. Therefore, the constructor of the subobjects need to retain their owner, which balances out this pending release. This looks like a leak, but it's not - the owning object will destroy every subobject when the owning object gets destroyed, and the owning object gets destroyed when the last reference to either itself or any of the subobjects gets destroyed.

It's a little tricky to get it all correct, but it is possible. The API contract *seems* like it would require a retain cycle, but it's actually possible to implement without one, by unifying the retain counts of all the objects in the cycle into a single retain count for the whole group, thereby treating the cycle as a single item.

Monday, March 18, 2024

So I wrote a double delete...

I wrote a double delete. Actually, it was a double autorelease. Here's a fun story describing the path it took to figure out the problem.

I'm essentially writing a plugin to a different app, and the app I'm plugging-in-to is closed-source. So, I'm making a dylib which gets loaded at runtime. I'm doing this on macOS.

The Symptom

When running my code, I'm seeing this:


Let's see what we can learn from this.

First, it's a crash. We can see we're accessing memory that we shouldn't be accessing.

Second, it's inside objc_release(). We can use a bit of deductive reasoning here: If the object we're releasing has a positive retain count, then the release shouldn't crash. Therefore, either we're releasing something that isn't an object, or we're releasing something that has a retain count of 0 (meaning: a double release).

Third, we can actually read a bit of the assembly to understand what's happening. The first two instructions are just a way to check if %rdi is null, and, if so, jump to an address that's later in the function. Therefore, we can deduce that %rdi isn't null.

%rdi is interesting because it's the register that holds the first argument. It's probably a safe assumption to make that objc_release() probably just takes a single argument, and that argument is a pointer, and that pointer is stored in %rdi. This assumption is somewhat-validated by reading the assembly: nothing seems to be using any of the other parameter registers.

The next 3 lines check if the low bit in %rdi is 1 or not. If it's 1, then we again jump to an address that's later in the function. Therefore, we can deduce that %rdi is an even number (its low bit isn't 1).

The next 3 lines load a value that %rdi is pointing to, and mask off most of its bits. The next line, which is the line that's crashing, is trying to load the value that the result points to.

All this makes total sense: Releasing a null pointer should do nothing, and releasing tagged pointers (which I'm assuming are marked by having their low bit set to 1) should do nothing as well. If the argument is an Objective-C object, it looks like we're trying to load the isa pointer, which probably holds something useful at offset 0x20. That's the point where we're crashing.

That leads to the deduction: Either the thing we're trying to release isn't an Objective-C object, or it's already been released, and the release procedure clears (or somehow poisons) the isa value, which caused this crash. Either way, we're releasing something that we shouldn't be releasing.

One of the really useful observations about the assembly is that nothing before the crash point clobbers the value of %rdi. This means that a pointer to the object that's getting erronously released is *still* in %rdi at the crash site.

We can also see that the crash is happening inside AutoreleasePool:

This doesn't indicate much - just that we're autoreleasing the object instead of releasing it directly. It also means that, because autorelease is delayed, we can't see anything useful in the stack trace. (If we were releasing directly instead of autoreleasing, we could see exactly what caused it in the stack trace.)

The First Thing That Didn't Work

The most natural solution would be "Let's use Instruments!" It's supposed to have a tool that shows all the retain stacks and release stacks for every object.

When running with Instruments, we get a nice crash popup showing us that we crashed:

The coolest part about this is that it shows us the register state at the crash site, which gives us %rdi, the pointer to the object getting erroneously released.

Cool, so the object which is getting erroneously released is at 0x600002a87b40. Let's see what instruments lists for that address:

Well, it didn't list anything for that address. It listed something for an address just before it, and just after it, but not what we were looking for. Thanks for nothing, Instruments.

The Second Thing That Didn't Work

Well, I'm allocating and destroying objects in my own code. Why don't I try to add logging all my own objects to see where they all get retained and released! Hopefully, by cross referencing the address of the object that gets erroneously deleted with the logging of the locations of my own objects, I'll be able to tell what's going wrong.

We can do this by overriding the -[NSObject release] and -[NSObject retain] calls:

As well as init / dealloc:

Unfortunately, this spewed out a bunch of logging, but the only thing it told me was the object that was being erroneously released wasn't one of my own objects. It must be some other object (NSString, NSArray, etc.).

The Third Thing That Didn't Work

Okay, we know the object is being erroneously autoreleased. Why don't we log some useful information every time anyone autoreleases anything? We can add a symbolic breakpoint on -[NSObject autorelease].

Here's what it looks like when this breakpoint is hit:

Interesting - so it looks like all calls to -[NSObject autorelease] are immediately redirected to _objc_rootAutorelease(). The self pointer is preserved as the value of the first argument.

If you list the registers at the time of the call, you can see the object being released:

So let's modify the breakpoint to print all the information we're looking for:

Unfortunately, this didn't work because it was too slow. Every time lldb evaluates something, it takes a bunch of time, and this was evaluating 3 things every time anybody wanted to autorelease anything, which is essentially all the time. The closed-source application I'm debugging is sensitive enough, that if anything takes too long, the application just quits.

The Fourth Thing That Didn't Work

Lets try to print out the same information as before, but do it inside the application rather than in lldb. That way, it will be much faster.

The way we can do this is with something called "function interposing." This uses a feature of dyld which can replace a library's function with your own. Note that this only works if you disable SIP and set the nvram variable amfi_get_out_of_my_way=0x1 and reboot.

We can do this to swap out all calls to _objc_rootAutorelease() with our own function.

Inside our own version of _objc_rootAutorelease(), we want to keep track of everything that gets autoreleased. So, let's keep track of a global dictionary, from pointer value to info string.

We can initialize this dictionary inside a "constructor," which is a special function in a dylib which gets run when the dylib gets loaded by dyld. This is a great way to initialize a global.

Inside my_objc_rootAutorelease(), we can just add information to the dictionary. Then, when the crash occurs, we can print the dictionary and find information about the thing that was autoreleased.

However, something is wrong...

The dictionary only holds 315 items. That can't possibly be right - it's inconceivable that only 315 things got autoreleased.

The Fifth Thing That Didn't Work

We're close - we just need to figure out why so few things got autoreleased. Let's verify our assumptions, that [foo autorelease] actually calls _objc_rootAutorelease() by writing such code and looking at its disassembly.

And if you look at the disassembly...

You can see 2 really interesting things: the call to alloc and init got compressed to a single C call to objc_alloc_init(), and the call to autorelease got compressed to a single C call to obc_autorelease(). I suppose the Objective-C compiler knows about the autorelease message, and is smart enough to not invoke the entire objc_msgSend() infrastructure for it, but instead just emits a raw C call for it. So that means we've interposed the wrong function - we were interposing _objc_rootAutorelease() when we should have been interposing objc_autorelease(). So let's interpose both:

This, of course, almost worked - we just have to be super sure that my_objc_autorelease() doesn't accidentally call autorelease on any object - that would cause infinite recursion.

The Sixth Thing That Didn't Work

Avoiding calling autorelease inside my_objc_autorelease() is actually pretty much impossible, because anything interesting you could log about an object will, almost necessarily, call autorelease. Remember that we're logging information about literally every object which gets autoreleased, which is, in effect, every object in the entire world. Even if you call NSStringFromClass([object class]) that will still cause something to be autoreleased.

So, the solution is to set some global state for the duration of the call to my_objc_autorelease(). If we see a call to my_objc_autorelease() while the state is set, that means we're autoreleasing inside being autoreleased, and we can skip our custom logic and just call the underlying objc_autorelease() directly. However, there's a caveat: this "global" state can't actually be global, because Objective-C objects are created and retained and released on every thread, which means this state has to be thread-local. Therefore, because we're writing in Objective-C and not C++, we must use the pthreads API. The pthreads threadspecific API uses a "key" which has to be set up once, so we can do that in our constructor:

Then we can use pthread_setspecific() and pthread_getspecific() to determine if our calls are being nested.

Except this still didn't actually work, because abort() is being called...

The Seventh Thing That Didn't Work

Luckily, when abort() is called, Xcode shows us a pending Objective-C exception:

Okay, something is being set to nil when it shouldn't be. Let's set an exception breakpoint to see what is being set wrong:

Welp. It turns out NSStringFromClass([object class]) can sometimes return nil...

The Eighth Thing That Worked

Okay, let's fix that by checking for nil and using [NSNull null]. Now, the program actually crashes in the right place!

That's more like it. Let's see what the pointer we're looking for is...

Okay, let's look for it in bigDict!

Woohoo! Finally some progress. The object being autoreleased is an NSDictionary.

But that's not enough, though. What we really want is a backtrace. We can't use lldb's backtrace because it's too slow, but luckily macOS has a backtrace() function which gives us backtrace information! Let's build a string out of the backtrace information:

Welp, that's too slow - the program exits. Let's try again by setting frameCount to 6:

So here is the final autorelease function:

Okay, now let's run it, and print out the object we're interested in:

And the bigDict:

Woohoo! It's a great success! Here's the stack trace, formatted:

Excellent! This was enough for me to find the place where I had over-released the object.