Hell Oh Entropy!

_FORTIFY_SOURCE=3 performance

Posted: Jan 05, 2023, 16:28

So early last year I finished implemented everything needed for a fully working _FORTIFY_SOURCE=3 so that disrtributions can use it out of the box. OpenSUSE adopted it almost immediately and Gentoo started the work of adding it to their hardened profile. I proposed to make it the default for Fedora 38 after some tests but people quoted to me this blog post that some guy wrote, telling me that there’s a performance issue. Since my explanations and clarifications in the Fedora wiki or on the Fedora devel list is not sufficient (the feature was approved but the “_FORTIFY_SOURCE=3 has performance overhead” claims don’t seem to stop), here’s a blog post for a blog post, stating conclusively that the performance issue is theoretical and overstated, the guy didn’t know what he was talking about when he wrote it.

That guy is working on a clarification blog post of his own, describing in some more detail why the concern is overblown but he has to jump through editorial hoops of a multi-billion dollar corporation that pays his salary, so his apology to me is going to take a while. Whenever he gets to publish his work, I’ll link it here so that it’s two blog posts against one. Take that!

Comments

That is not a number, that is a freed object

Posted: Apr 20, 2022, 12:01

How many of you have written this kind of code in the past:

o = xmalloc (old_size);
...
n = xrealloc (o, new_size);

if (n != o)
  {
    o = n;
    /* Update other pointers that referred to o or offsets from it.  */
  }

Not uncommon right? We’re not dereferencing the freed o and the pointer is after all, a number and hence should be perfectly safe to check, right? And more optimal too since we’re not updating pointers if it’s not necessary. Well…

Better Fortification

TLDR; I broke this ‘safety’ in my implementation for __builtin_dynamic_object_size in gcc but I’m not wrong, you are! See the last section for why.

Now for those of you interested in the story, it all began with the implementation for __builtin_dynamic_object_size. This builtin was implemented first in clang and promised to be a better __builtin_object_size, which was severely limited by its necessity to emit a constant. That restriction meant that (1) there were many cases where it just couldn’t arrive at a constant size and (2) where it did, it would come up with an upper or lower estimate and not necessarily a precise size. Given that the builtin is primarily used to implement _FORTIFY_SOURCE (there’s a more detailed blog post describing its mechanism out there), this directly reduces the scope of this security protection.

__builtin_dynamic_object_size had deeper implications than just being a dynamic version of __builtin_object_size however, which had led to initial pushback in the gcc community. Now that the implementation is due to come out in gcc 12.1 and is being tested with distribution rebuilds, new and interesting implications are being discovered. One of these (and so far the most fascinating to me) was its impact on using (not dereferencing, mind you) a freed pointer.

How gcc deduces object sizes

The object size computation is largely (there are some caveats here but not important for the purposes of this post) done in a separate pass. The pass runs twice, once very early in the pass chain and finally, near the end of the tree passes. The early run is a hack that tries to record subobject size estimates before subsequent passes simplify subobject references to references to their parent object, thus returning a more precise subobject size. The late run is where the actual fun happens.

The object sizes pass, at a high level, tracks the pointer passed to either __builtin_object_size or __builtin_dynamic_object_size to all possible objects it may point to and subsequently, to the site of their assignment, to derive the size. In the static case (i.e. __builtin_object_size), it tries to come up with either the maximum or the minimum estimate while in the dynamic case it builds a fancy expression that would evaluate to the precise size at that point. Of course, ‘precise’ shouldn’t be taken for granted because there could be future changes that make the expressions imprecise in the interest of broadening coverage. If the pass is unable to deduce a size of any of the target objects of the pointer for any reason (passed through a call, non-constant in the static case, etc.), the call is replaced with (size_t) -1 or (size_t) 0 as appropriate.

I can’t judge what I can’t see

As the pass tracks origins of the pointer in question, it unfortunately does not take into account any uses between the allocation and the reference in the builtin that may alter the nature of the pointer. This means that if the pointer was reallocated between its first allocation and the builtin call, the pass won’t notice unless the pointer was explicitly updated. This is a benign limitation in the static case because for the above example, it would simply compute the maximum of new_size and old_size and return the result. In fact in most real world cases since the reallocation is bound to be dynamic, it would simply bail out, resulting in a missed fortification.

With dynamic sizes though, one will now get the new size for n != o but not for the n == o case. As a result, any fortified function call based on this information will see the old size and abort fearing a buffer overflow even though there technically wasn’t any. This was seen in autogen, which had this precise pattern and hence stumbled when it was built with _FORTIFY_SOURCE=3.

It’s a bug, it’s not a bug…

After a bit of back and forth, Martin Liška very helpfully came up with a contained reproducer that allowed us to see what had actually happened. I had broken a pretty common idiom, which meant that those applications would have false positive aborts, something that hadn’t happened with _FORTIFY_SOURCE before. That is until I found an excuse that I could use to point the finger back at you (which includes past me, who is clearly a different person, no?), the developer!

Object Lifetimes

clang 13 also broke with the test case Martin shared after I altered it a bit to fortify fread. That gave me first relief because clearly whatever I did wrong, the smart folks in the clang community did wrong too. So I wasn’t that stupid. Then of course, there was this, which put our collective ‘stupidity’ into perspective, kinda letting us off the hook:

Section 6.2.4 of the ISO C standard (I’m referring to an April 2011 draft because who even in their right minds pays for their copy?!) has this in point number 2:

The lifetime of an object is the portion of program execution during which storage is
guaranteed to be reserved for it. An object exists, has a constant address, and retains
its last-stored value throughout its lifetime. If an object is referred to outside of its
lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when
the object it points to (or just past) reaches the end of its lifetime.

It clearly states that even the value of the pointer pointing to the object is not reliable after it has been freed, so not only should one avoid dereferencing the pointer after it is freed, they should refrain from using it altogether.

Essentially, the comparison with the old pointer results in undefined behaviour. I don’t think the standards committee intended to invalidate this specific idiom with that language, but it does allow compilers the freedom to make assumptions about pointer validity and this idiom ends up trouncing on it. It is possible for the compiler to look for a dominating realloc and update its expectations for size in very specific cases, but it still remains largely unsupported. It won’t, for example, work in cases where a reallocation has been wrapped in a function without any malloc attribute annotations. In fact, gcc 12 has a new -Wuse-after-free option that warns users of this that I, admittedly, once thought was too harsh.

EDIT 2022-04-21: This spawned a conversation in the rust community and Ralf Jung pointed out a way to think about this in pointer provenance terms and does not rely on the above C standard indeterminate pointer clause. This is very relevant because what the object size pass does in this context is essentially pointer provenance (albeit limited and somewhat incomplete), which makes it natural for it to trip on this implicit assumption of o == n. Continuing to use o (and any pointers derived from it) in this context is incorrect.

Getting better together

I’m going to try and support some of these simple cases in gcc during the gcc 13 cycle but in general, this is undefined behaviour. If your code uses this idiom, you should start weaning away from it if it’s not performance sensitive and unconditionally update pointers once their lifetime ends.

Deploying _FORTIFY_SOURCE=3 more widely has been a learning experience (all owing to Martin Liška’s efforts since he was the one building thousands of packages and reporting bugs!) in the deeper implications that __builtin_dynamic_object_size would have when replacing __builtin_object_size. Another interesting implication was the misuse of malloc_usable_size and equivalent interfaces that we discovered with systemd and jemalloc that open up deeper design questions for malloc interfaces. More on that in a separate post either here or on one of the Red Hat blogs.

A simple change of more precise object sizes and wider coverage ended up not only weeding out actual overflows, but also some interesting corner cases and “adventurous” programming practices. I’m going to start rolling some of this out into Fedora near the end of the year and we’ll hopefully have better mitigations in Linux distributions very soon.

Comments

Higher order functions in C?

Posted: Jun 10, 2012, 10:01

The Structure and Interpretation of Computer Programs course has a class on higher order functions, which is the ability of a function to accept a function and/or returns another function that uses the input function. The C programming language has very limit capability to do this and it is limited to being able to accept function pointers or return function pointers. I wanted to explore this limitation further to figure out exactly how far I can stretch this, so I wrote a small C program that tries to emulate this concept. This is just random tinkering, so if the reader is looking for a specific lesson, then let me clarify that there is none.

I wrote a program to get square roots using the Newton-Raphson method -- the same one that was used in the course with an attempt to keep the structure as close as possible to the scheme example in the class video. The Lisp-based listing for the course can be found on the internet and here is what I wrote:

#include <stdio.h>
#include <stdlib.h>
typedef double (*generic_func_t) (double);
#define dx 0.000001
#define absolute(n) ((n) < 0 ? -(n) : (n))
generic_func_t deriv(generic_func_t f)
{
        double deriv_func(double x)
        {
                return (f(x + dx) - f(x)) / dx;
        }
        return deriv_func;
}
double fixed_point(generic_func_t f, double x)
{
        while (absolute(f(x) - x) > dx)
                x = f(x);       
        return f(x);                            
}
double newton(generic_func_t f, double guess)
{
        double newton_func(double y)
        {
                generic_func_t df = deriv(f);
                return (y - f(y) / df (y));
        }
        return fixed_point(newton_func, guess);
}
double sqrt(double x)
{
        double sqrt_func(double y)
        {
                return (x - y*y);
        }
        return newton(sqrt_func, 1);
}
int main(int argc, char **argv)
{
        if (argc < 2) {
                printf(“Usage: %s <number>\n”, argv[0]);
                exit (1);
        }
        printf (“%lf\n”, sqrt(strtod(argv[1], NULL)));
        return 0;
}

The first thing that is evident in the above is that the syntax is non-standard. I have declared some functions within another function and that is not allowed by the C standard. GCC however implements this concept of nested functions as an extension, so this compiled cleanly for me. Running this however is a completely different thing.

Executing the above program for any argument results in segfaults and analysis of the program under gdb shows that the generated code is quite ridiculous. This is expected however, since we have pushed the gcc nested function support too far. The nested functions within gcc are OK as long as they remain within the following limits:

They are only used within the function that nests them, either directly or indirectly via some function calls. Essentially, control from the nested function should return before it’s nesting function returns.
If they’re used outside the nesting function and called after the nesting function returns, then they don’t use any local variables or arguments of the nesting function

Of course, if (2) is true then the function need not have been nested in the first place. Looking at the listing above, the functions sqrt_func, newton_func and deriv_func are the nested functions. Of these however, only deriv_func is actually returned as a function to be used from within newton_func. This is wrong because it breaks (2). deriv_func uses the input argument f(x) which is essentially sqrt_func and is actually called after the deriv() function returns. Other than that however, all of our ‘functional-type’ code looks sane and within what is supported by gcc.

So I modified the deriv function and came up with the listing below:

#include <stdio.h>
#include <stdlib.h>
typedef double (*generic_func_t) (double);
#define dx 0.000001
#define absolute(n) ((n) < 0 ? -(n) : (n))
double deriv(double x, generic_func_t f)
{
        return (f(x + dx) - f(x)) / dx;
}
double fixed_point(generic_func_t f, double x)
{
        while (absolute(f(x) - x) > dx)
                x = f(x);
        return f(x);
}
double newton(generic_func_t f, double guess)
{
        double newton_func(double y)
        {
                double dy = deriv(y, f);
                return (y - f(y) / dy);
        }
        return fixed_point(newton_func, guess);
}
double sqrt(double x)
{
        double sqrt_func(double y)
        {
                return (x - y*y);
        }
        return newton(sqrt_func, 1);
}
int main(int argc, char **argv)
{
        if (argc < 2) {
                printf(“Usage: %s <number>\n”, argv[0]);
                exit (1);
        }
        printf (“%lf\n”, sqrt(strtod(argv[1], NULL)));
        return 0;
}

The only change above is that instead of accepting a function and returning another in deriv, I now accept a function and return the evaluation of the derived function for the specified value. This is highlighted in bold above. This program now builds with gcc and also runs correctly. It should similarly be possible to eliminate the other nested functions to get a much more compact and standards-compliant program.

Comments