That is not a number, that is a freed object

by siddhesh on Apr 20, 2022, 12:01

How many of you have written this kind of code in the past:

o = xmalloc (old_size);
...
n = xrealloc (o, new_size);

if (n != o)
  {
    o = n;
    /* Update other pointers that referred to o or offsets from it.  */
  }

Not uncommon right? We’re not dereferencing the freed o and the pointer is after all, a number and hence should be perfectly safe to check, right? And more optimal too since we’re not updating pointers if it’s not necessary. Well…

Better Fortification

TLDR; I broke this ‘safety’ in my implementation for __builtin_dynamic_object_size in gcc but I’m not wrong, you are! See the last section for why.

Now for those of you interested in the story, it all began with the implementation for __builtin_dynamic_object_size. This builtin was implemented first in clang and promised to be a better __builtin_object_size, which was severely limited by its necessity to emit a constant. That restriction meant that (1) there were many cases where it just couldn’t arrive at a constant size and (2) where it did, it would come up with an upper or lower estimate and not necessarily a precise size. Given that the builtin is primarily used to implement _FORTIFY_SOURCE (there’s a more detailed blog post describing its mechanism out there), this directly reduces the scope of this security protection.

__builtin_dynamic_object_size had deeper implications than just being a dynamic version of __builtin_object_size however, which had led to initial pushback in the gcc community. Now that the implementation is due to come out in gcc 12.1 and is being tested with distribution rebuilds, new and interesting implications are being discovered. One of these (and so far the most fascinating to me) was its impact on using (not dereferencing, mind you) a freed pointer.

How gcc deduces object sizes

The object size computation is largely (there are some caveats here but not important for the purposes of this post) done in a separate pass. The pass runs twice, once very early in the pass chain and finally, near the end of the tree passes. The early run is a hack that tries to record subobject size estimates before subsequent passes simplify subobject references to references to their parent object, thus returning a more precise subobject size. The late run is where the actual fun happens.

The object sizes pass, at a high level, tracks the pointer passed to either __builtin_object_size or __builtin_dynamic_object_size to all possible objects it may point to and subsequently, to the site of their assignment, to derive the size. In the static case (i.e. __builtin_object_size), it tries to come up with either the maximum or the minimum estimate while in the dynamic case it builds a fancy expression that would evaluate to the precise size at that point. Of course, ‘precise’ shouldn’t be taken for granted because there could be future changes that make the expressions imprecise in the interest of broadening coverage. If the pass is unable to deduce a size of any of the target objects of the pointer for any reason (passed through a call, non-constant in the static case, etc.), the call is replaced with (size_t) -1 or (size_t) 0 as appropriate.

I can’t judge what I can’t see

As the pass tracks origins of the pointer in question, it unfortunately does not take into account any uses between the allocation and the reference in the builtin that may alter the nature of the pointer. This means that if the pointer was reallocated between its first allocation and the builtin call, the pass won’t notice unless the pointer was explicitly updated. This is a benign limitation in the static case because for the above example, it would simply compute the maximum of new_size and old_size and return the result. In fact in most real world cases since the reallocation is bound to be dynamic, it would simply bail out, resulting in a missed fortification.

With dynamic sizes though, one will now get the new size for n != o but not for the n == o case. As a result, any fortified function call based on this information will see the old size and abort fearing a buffer overflow even though there technically wasn’t any. This was seen in autogen, which had this precise pattern and hence stumbled when it was built with _FORTIFY_SOURCE=3.

It’s a bug, it’s not a bug…

After a bit of back and forth, Martin Liška very helpfully came up with a contained reproducer that allowed us to see what had actually happened. I had broken a pretty common idiom, which meant that those applications would have false positive aborts, something that hadn’t happened with _FORTIFY_SOURCE before. That is until I found an excuse that I could use to point the finger back at you (which includes past me, who is clearly a different person, no?), the developer!

Object Lifetimes

clang 13 also broke with the test case Martin shared after I altered it a bit to fortify fread. That gave me first relief because clearly whatever I did wrong, the smart folks in the clang community did wrong too. So I wasn’t that stupid. Then of course, there was this, which put our collective ‘stupidity’ into perspective, kinda letting us off the hook:

Section 6.2.4 of the ISO C standard (I’m referring to an April 2011 draft because who even in their right minds pays for their copy?!) has this in point number 2:

The lifetime of an object is the portion of program execution during which storage is
guaranteed to be reserved for it. An object exists, has a constant address, and retains
its last-stored value throughout its lifetime. If an object is referred to outside of its
lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when
the object it points to (or just past) reaches the end of its lifetime.

It clearly states that even the value of the pointer pointing to the object is not reliable after it has been freed, so not only should one avoid dereferencing the pointer after it is freed, they should refrain from using it altogether.

Essentially, the comparison with the old pointer results in undefined behaviour. I don’t think the standards committee intended to invalidate this specific idiom with that language, but it does allow compilers the freedom to make assumptions about pointer validity and this idiom ends up trouncing on it. It is possible for the compiler to look for a dominating realloc and update its expectations for size in very specific cases, but it still remains largely unsupported. It won’t, for example, work in cases where a reallocation has been wrapped in a function without any malloc attribute annotations. In fact, gcc 12 has a new -Wuse-after-free option that warns users of this that I, admittedly, once thought was too harsh.

EDIT 2022-04-21: This spawned a conversation in the rust community and Ralf Jung pointed out a way to think about this in pointer provenance terms and does not rely on the above C standard indeterminate pointer clause. This is very relevant because what the object size pass does in this context is essentially pointer provenance (albeit limited and somewhat incomplete), which makes it natural for it to trip on this implicit assumption of o == n. Continuing to use o (and any pointers derived from it) in this context is incorrect.

Getting better together

I’m going to try and support some of these simple cases in gcc during the gcc 13 cycle but in general, this is undefined behaviour. If your code uses this idiom, you should start weaning away from it if it’s not performance sensitive and unconditionally update pointers once their lifetime ends.

Deploying _FORTIFY_SOURCE=3 more widely has been a learning experience (all owing to Martin Liška’s efforts since he was the one building thousands of packages and reporting bugs!) in the deeper implications that __builtin_dynamic_object_size would have when replacing __builtin_object_size. Another interesting implication was the misuse of malloc_usable_size and equivalent interfaces that we discovered with systemd and jemalloc that open up deeper design questions for malloc interfaces. More on that in a separate post either here or on one of the Red Hat blogs.

A simple change of more precise object sizes and wider coverage ended up not only weeding out actual overflows, but also some interesting corner cases and “adventurous” programming practices. I’m going to start rolling some of this out into Fedora near the end of the year and we’ll hopefully have better mitigations in Linux distributions very soon.