_FORTIFY_SOURCE=3 performance
Posted: Jan 05, 2023, 16:28So early last year I finished implemented everything needed for a fully working _FORTIFY_SOURCE=3
so that disrtributions can use it out of the box. OpenSUSE adopted it almost immediately and Gentoo started the work of adding it to their hardened profile. I proposed to make it the default for Fedora 38 after some tests but people quoted to me this blog post that some guy wrote, telling me that there’s a performance issue. Since my explanations and clarifications in the Fedora wiki or on the Fedora devel list is not sufficient (the feature was approved but the “_FORTIFY_SOURCE=3
has performance overhead” claims don’t seem to stop), here’s a blog post for a blog post, stating conclusively that the performance issue is theoretical and overstated, the guy didn’t know what he was talking about when he wrote it.
That guy is working on a clarification blog post of his own, describing in some more detail why the concern is overblown but he has to jump through editorial hoops of a multi-billion dollar corporation that pays his salary, so his apology to me is going to take a while. Whenever he gets to publish his work, I’ll link it here so that it’s two blog posts against one. Take that!
That is not a number, that is a freed object
Posted: Apr 20, 2022, 12:01How many of you have written this kind of code in the past:
o = xmalloc (old_size);
...
n = xrealloc (o, new_size);
if (n != o)
{
o = n;
/* Update other pointers that referred to o or offsets from it. */
}
Not uncommon right? We’re not dereferencing the freed o
and the pointer is after all, a number and hence should be perfectly safe to check, right? And more optimal too since we’re not updating pointers if it’s not necessary. Well…
Better Fortification
TLDR; I broke this ‘safety’ in my implementation for __builtin_dynamic_object_size
in gcc but I’m not wrong, you are! See the last section for why.
Now for those of you interested in the story, it all began with the implementation for __builtin_dynamic_object_size
. This builtin was implemented first in clang and promised to be a better __builtin_object_size
, which was severely limited by its necessity to emit a constant. That restriction meant that (1) there were many cases where it just couldn’t arrive at a constant size and (2) where it did, it would come up with an upper or lower estimate and not necessarily a precise size. Given that the builtin is primarily used to implement _FORTIFY_SOURCE
(there’s a more detailed blog post describing its mechanism out there), this directly reduces the scope of this security protection.
__builtin_dynamic_object_size
had deeper implications than just being a dynamic version of __builtin_object_size
however, which had led to initial pushback in the gcc community. Now that the implementation is due to come out in gcc 12.1 and is being tested with distribution rebuilds, new and interesting implications are being discovered. One of these (and so far the most fascinating to me) was its impact on using (not dereferencing, mind you) a freed pointer.
How gcc deduces object sizes
The object size computation is largely (there are some caveats here but not important for the purposes of this post) done in a separate pass. The pass runs twice, once very early in the pass chain and finally, near the end of the tree passes. The early run is a hack that tries to record subobject size estimates before subsequent passes simplify subobject references to references to their parent object, thus returning a more precise subobject size. The late run is where the actual fun happens.
The object sizes pass, at a high level, tracks the pointer passed to either __builtin_object_size
or __builtin_dynamic_object_size
to all possible objects it may point to and subsequently, to the site of their assignment, to derive the size. In the static case (i.e. __builtin_object_size
), it tries to come up with either the maximum or the minimum estimate while in the dynamic case it builds a fancy expression that would evaluate to the precise size at that point. Of course, ‘precise’ shouldn’t be taken for granted because there could be future changes that make the expressions imprecise in the interest of broadening coverage. If the pass is unable to deduce a size of any of the target objects of the pointer for any reason (passed through a call, non-constant in the static case, etc.), the call is replaced with (size_t) -1
or (size_t) 0
as appropriate.
I can’t judge what I can’t see
As the pass tracks origins of the pointer in question, it unfortunately does not take into account any uses between the allocation and the reference in the builtin that may alter the nature of the pointer. This means that if the pointer was reallocated between its first allocation and the builtin call, the pass won’t notice unless the pointer was explicitly updated. This is a benign limitation in the static case because for the above example, it would simply compute the maximum of new_size
and old_size
and return the result. In fact in most real world cases since the reallocation is bound to be dynamic, it would simply bail out, resulting in a missed fortification.
With dynamic sizes though, one will now get the new size for n != o
but not for the n == o
case. As a result, any fortified function call based on this information will see the old size and abort fearing a buffer overflow even though there technically wasn’t any. This was seen in autogen, which had this precise pattern and hence stumbled when it was built with _FORTIFY_SOURCE=3
.
It’s a bug, it’s not a bug…
After a bit of back and forth, Martin Liška very helpfully came up with a contained reproducer that allowed us to see what had actually happened. I had broken a pretty common idiom, which meant that those applications would have false positive aborts, something that hadn’t happened with _FORTIFY_SOURCE
before. That is until I found an excuse that I could use to point the finger back at you (which includes past me, who is clearly a different person, no?), the developer!
Object Lifetimes
clang 13 also broke with the test case Martin shared after I altered it a bit to fortify fread
. That gave me first relief because clearly whatever I did wrong, the smart folks in the clang community did wrong too. So I wasn’t that stupid. Then of course, there was this, which put our collective ‘stupidity’ into perspective, kinda letting us off the hook:
Section 6.2.4 of the ISO C standard (I’m referring to an April 2011 draft because who even in their right minds pays for their copy?!) has this in point number 2:
The lifetime of an object is the portion of program execution during which storage is
guaranteed to be reserved for it. An object exists, has a constant address, and retains
its last-stored value throughout its lifetime. If an object is referred to outside of its
lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when
the object it points to (or just past) reaches the end of its lifetime.
It clearly states that even the value of the pointer pointing to the object is not reliable after it has been freed, so not only should one avoid dereferencing the pointer after it is freed, they should refrain from using it altogether.
Essentially, the comparison with the old pointer results in undefined behaviour. I don’t think the standards committee intended to invalidate this specific idiom with that language, but it does allow compilers the freedom to make assumptions about pointer validity and this idiom ends up trouncing on it. It is possible for the compiler to look for a dominating realloc and update its expectations for size in very specific cases, but it still remains largely unsupported. It won’t, for example, work in cases where a reallocation has been wrapped in a function without any malloc
attribute annotations. In fact, gcc 12 has a new -Wuse-after-free
option that warns users of this that I, admittedly, once thought was too harsh.
EDIT 2022-04-21: This spawned a conversation in the rust community and Ralf Jung pointed out a way to think about this in pointer provenance terms and does not rely on the above C standard indeterminate pointer clause. This is very relevant because what the object size pass does in this context is essentially pointer provenance (albeit limited and somewhat incomplete), which makes it natural for it to trip on this implicit assumption of o == n
. Continuing to use o
(and any pointers derived from it) in this context is incorrect.
Getting better together
I’m going to try and support some of these simple cases in gcc during the gcc 13 cycle but in general, this is undefined behaviour. If your code uses this idiom, you should start weaning away from it if it’s not performance sensitive and unconditionally update pointers once their lifetime ends.
Deploying _FORTIFY_SOURCE=3
more widely has been a learning experience (all owing to Martin Liška’s efforts since he was the one building thousands of packages and reporting bugs!) in the deeper implications that __builtin_dynamic_object_size
would have when replacing __builtin_object_size
. Another interesting implication was the misuse of malloc_usable_size
and equivalent interfaces that we discovered with systemd and jemalloc that open up deeper design questions for malloc interfaces. More on that in a separate post either here or on one of the Red Hat blogs.
A simple change of more precise object sizes and wider coverage ended up not only weeding out actual overflows, but also some interesting corner cases and “adventurous” programming practices. I’m going to start rolling some of this out into Fedora near the end of the year and we’ll hopefully have better mitigations in Linux distributions very soon.
Fedora Activity Day at Pune: Towards a more secure Fedora
Posted: Nov 02, 2014, 07:10Huzaifa had wanted to do a Security FAD in Pune for a while to tackle the really high number of open security bugs in Fedora. We had initially set a date for September but we pushed it forward since Huzaifa was not available. In the end, Huzaifa was not not available even on the rescheduled date, so PJP took over ownership of the event.
I wasn’t expecting a lot of people to attend given the nature of the activity and as it turned out, there were 14 signups with 7 showing up finally. We also had a few people joining remotely, which was awesome. We also had a Docker event running in parallel at the venue (the Red Hat Pune office), so we had more company at lunch.
Everyone barring PJP came in on India Standard Time, i.e. late by a few minutes to an hour or so. We started a bit late as a result, with a quick introduction to security in Fedora by PJP. After the talk and questions we didn’t waste any time and quickly got down to triaging security bugs. Our plan of action was to take ownership (by setting fst_owner= in the bugzilla whiteboard) of security bugs we understand and start working on driving them to conclusion. What this implied was that we would have to follow up after the FAD to ensure that the bugs were closed.
I started from the oldest bugs (dating back to 2011!) and managed to own 8 bugs by the end of the day. We had many a spirited discussion over what constituted a security bug (most of us understood OS security to a fair extent, but were not security experts) and my impression was that all of us went home a bit wiser. I learned that xen is a horrible horrible package - it bundles a bazillion projects into itself, due to which fixing flaws in the original project is not sufficient and xen would need to be checked and fixed separately.
Overall we had a pretty good day where 36 bugs got new owners - we managed to reduce the total backlog (of unowned bugs) from 370 to 334. Hopefully some of us will continue to work in our spare time (I know I’ll try) and bring that backlog down further.
Blank lines in /etc/netgroup
Posted: Jan 27, 2014, 11:36While doing a code audit, I noticed that netgroups queries resulted in warnings in valgrind on my system:
==14597== Invalid read of size 1 ==14597== at 0xBB735E0: _nss_files_setnetgrent (files-netgrp.c:106) ==14597== by 0x4F4954E: __internal_setnetgrent_reuse (getnetgrent_r.c:139) ==14597== by 0x4F49879: setnetgrent (getnetgrent_r.c:181) ==14597== by 0x4033EA: netgroup_keys (getent.c:493) ==14597== by 0x402370: main (getent.c:1011) ==14597== Address 0x51fe63f is 1 bytes before a block of size 120 alloc'd ==14597== at 0x4C2A45D: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==14597== by 0x4EA403D: getdelim (iogetdelim.c:66) ==14597== by 0xBB73575: _nss_files_setnetgrent (stdio.h:117) ==14597== by 0x4F4954E: __internal_setnetgrent_reuse (getnetgrent_r.c:139) ==14597== by 0x4F49879: setnetgrent (getnetgrent_r.c:181) ==14597== by 0x4033EA: netgroup_keys (getent.c:493) ==14597== by 0x402370: main (getent.c:1011)
The code was very obviously buggy too:
... while (line[curlen - 1] == '\n' && line[curlen - 2] == '\\') { /* Yes, we have a continuation line. */ if (found) ...
So if line
has just the newline character, curlen
will be 1 and hence, line[curlen - 2]
will be the byte before the array. This kind of thing is never caught normally because of two factors:
- The byte before
line
is part of the malloc metadata - The byte is read and not written to, hence there's no question of corrupting data and triggering one of malloc's consistency checking mechanisms.
Hence the code never crashes and we think that everything is just fine until the byte preceding line
just happens to be 0x5b
, i.e. ‘\‘
. That will again never happen in the normal case since it would mean that the size of the line is at least 0x5b00000000000000 on a 64-bit system. However, even if this happens, it may not do a lot of harm since all it does is resulting in concatenating an empty string to the subsequent netgroup line.
However, consider a scenario where a netgroups file was generated in a kickstart like this:
{ for i in foo bar baz; do echo "${i}_netgroup \\" for j in $(seq 1 10); do echo "($i, user_$j,) \\" done # Guess what! I can get away with the \ in the end if I leave a blank line! # How Cool is that! echo "" done } > /etc/netgroup
Here, the administrator has found a ‘feature’ where leaving blank lines allows them to get away with their very simple script to generate netgroups with trailing backslashes. Now what happens if line
happens to have a preceding 0x5b? The groups separated by the blank lines will get concatenated and all of a sudden you have one group with all of the entries being members of the first group!
So if an attacker manages to control this preceding byte in a program (possibly through a different heap overflow bug), (s)he could manipulate netgroup memberships and cause a breach. So we have a security vulnerability on our hands! Or do we?
Not such a big deal after all
So while it is fun to imagine scenarios to exploit code (who knows when you’ll get a hit!), this one seems like just a harmless buglet. Here’s why:
Controlling the preceding byte in such a manner is extremely difficult, since the malloc consistency checker should kick in before line
is even allocated this offending chunk of memory. One may argue that malloc could be overridden and the preceding byte could be tailored to ones needs, but that immediately takes out the interesting attack scenarios, i.e. the setuid-root programs. If you’re doing authentication using a non-setuid command line program then you’ve got more serious issues anyway because one could simply override getnetgrent_r or similar user/group/netgroup browsing functions and bypass your check.
Finally, this ‘exploit’ depends on the netgroups file being wrong, which is again extremely unlikely and even if it is, that’s a problem with the configuration and not a vulnerability in the code.
In the end it is just a trivial off-by-one bug which must be fixed, so it was.
FUDCon Notes on my Security Exploits Session
Posted: Nov 10, 2011, 10:17I was asked by a couple of people for notes on the security exploits session that I conducted at FUDCon. I had posted the code samples on the talk page, but that is probably a little terse, so here’s a little write-up to support the code samples. To repeat what I had said multiple times earlier; I am not a security researcher, not even a security freak. This topic was suggested to me by Amit Shah, and I developed an interest in it due to my original interest, which is operating systems tools. The preparation of this talk got me interested in security, but only through the perspective of operating systems tools and programs, so I am still relatively indifferent to the subject of web-based security.
I started preparing for the session fairly late; i.e. 2 days before FUDCon. I am a little familiar with glibc code and with the way the compiler, linker, loader, etc. work on Linux, so that helped me understand a lot of the concepts behind exploits fairly easily. But concepts != working code and getting exploit code to work was the real challenge, especially when I had just about 3 evenings+nights for it. I had started with an idea of showing stack smashing and privilege escalation examples, but given the time constraint, audience level (college students) and also the constraint of my knowledge, I decided to restrict it to stack based attacks. All of the examples have a buffer which is being written to without checking for bounds of that buffer, typically with an strcpy.
The shellcode sample:
The shellcode sample as well as the final vulnerability demo (smash.c and vulnerable.c) were derived from the article Stack Smashing for Fun and Profit. That is a great article that explains in much more detail than I went into in my session, as to how the shellcode exploit can be developed.The core idea of this is:
- Obtain the binary equivalent of execve (name[0], name, NULL ) where name = {“/bin/sh”, NULL}
- Append additional data to the binary data and pass it to the buffer copy routine (strcpy) in such a way that, it writes the location of the first instruction in the above binary equivalent into the location of the return address
I spent a lot of time trying to figure out where this was set and finally found the -z option of the linker. So to write out a binary that sets up an executable stack, I had to call the linker with -z execstack. This finally enabled me to get the shellcode working.
The actual exploit
Once the shellcode was done, I could get the final vulnerability working and I immediately set about trying it. The exploit is based on the above shellcode example, except for one difference. The shellcode example is just that, an example. It is not an actual exploit; it is just a roundabout way to get a shell. The exploit I was about to do was a real crack. The idea now is to accept a string as input, which is then fed in to make a regular and buggy program provide you with a shell.
To imagine how this would work, think of the program that gives you a login prompt. In the context of this exploit, you should be able to input a crafted string into this login prompt and have it give you a shell without actually knowing the password! This is what the actual exploit ought to look like.
Again, writing the exploit was the easy part; getting it to run was quite another thing altogether. The exploit works as follows:
- Pages of memory are generally mapped at the same addresses for processes, so the top of stack for a process can be a good starting estimate for the top of stack for another process. From that point on, you need a finite number of guesses to get to the point in stack where the instructions have been injected
- The smash program records its own top of stack and prepares a string such that it the return address will be overwritten by the top of stack address. The string obviously begins with shell code. To improve the chances of hitting valid instructions in the code, the buffer is written over by a lot of single-byte NOP instructions just before the shellcode instructions
- The smash program executes a shell with an environment variable exported ($EGG). This environment variable can now be used to execute the vulnerable program within that shell. This could have been done differently too, like storing the output to a file and executing the vulnerable program with input from that file. So the way input is given doesn’t really matter here
echo 0 > /proc/sys/kernel/randomize_va_space
Even with this, my example would not execute by itself and would end with a SIGILL. I suspect this has something to do with the fact that my systm is x86_64 while the samples are all 32-bit. Our overflow string does not seem to agree with the instruction set on my system. In any case, it seems to run just fine inside a debugger. So if you run smash to get a shell, run gdb vulnerable and then run it with $EGG, you get the shell! At least I had a demo now.
Jump to libc
While I was trying out the shellcode example, I continued thinking about various other ways in which I could get a shell. One of the methods I thought of was to overwrite the return address with the address of the system() glibc call and pass the string via stack. I later found out via Huzaifa that this is in fact a documented way to exploit unchecked buffers on stack. Huzaifa also said that I may be missing out on something there and gave me some tips on finding the right resources for this. I still could not get this working, but at least I found out why the exploit did not work.
This exploit seemed attractive to me because it does not require an executable stack. The instructions I want to execute are already there in memory. So I only have to overwrite the return address and continue writing “/bin/sh” on the stack. I first tried with x86_64 in this case, because I was going by my own idea at that time. I soon figured out that the system() function on x86_64 did not take function arguments from stack. It took the argument from the %rdi register. My devious plan had been foiled! I did not give up however and looked at the system() implementation on i686. This retained the old behaviour of popping arguments from the stack, so my exploit was still possible here.
Not. My code was correct, but every time I run the program, the address of system() had just 3 bytes set. So it would always look something like: 0x00aabbcc. This was bad news because this meant that I cannot continue writing the shell string into the stack (strcpy stops copying when it encounters a 0x0). This means that I can call system() (like I was able to on x86_64 too), but I cannot pass it an argument. After trying enough number of times, I concluded that this must be a security feature. This was backed up by the tip Huzaifa had shared with me to (ironically) get the exploit to work. This was perhaps the first documentation of a return to libc exploit by Solar Designer. In his explanation, Solar designer mentioned that a way to fix this would be to ensure a 0x00 in the address, which is precisely what is happening here.
This obviously does not deny the fact that such an exploit can be carried out if you want to call functions that do not have arguments. Think for example, of a function that executes a shell ;)
Conclusion
The last modify example was a simple little trick I wrote on the last day to demonstrate how buffer overflows work and how they can be used to alter program flow. That again is not an exploit at all. At most, it can be called… a buffer overflow ;)
I had even more fun preparing for this session than actually presenting it because it taught me a lot more than I could ever have done by just reading literature. I hope those who attended my session at FUDCon enjoyed the session too.
FUDCon Pune 2011 Day 2: Me, followed by lunch, followed by me, followed by me...
Posted: Nov 07, 2011, 03:12The title pretty much summarizes what most of my day looked like on day 2 of FUDCon. Well, not exacly, but it comes quite close. I had three sessions lined up in a single day and I was worried that I might lose my voice by the end of it. Ankur Sinha had all of 4 talks in the single day, so I was definitely better off that him.
The day started with Harish Pillay’s keynote on the community architecture team. The turnout on day 2 was less than that on day 1, which was a little surprising. Most of them trickled later in the day, so it meant that a large number of the attendees in Harish’s talk were Red Hatters and the CoEP volunteers. We probably started a little too early for a Saturday.
Immediately following that was my session on qpid messaging. The attendance in the session was modest (about 8-10 people), but the best part was that they were very involved in the session and that made the session worthwhile. Mrugesh Karnik also joined the session mid-way and asked some really good questions that actually helped my session. We ended up doing a queue design for a fictitious stock trading system and I was able to show how the design could scale very easily with a qpid messaging broker in place. Unfortunately, most of the attendees did not have laptops, so I could not engage them in a hands-on session. In fact, that was my story of the day to a large extent. I had intended all my sessions to be hands-on, but most of it never really materialized because most of the audience did not have laptops.
After the qpid session, I spent some time chatting with Sankarshan, Mrugesh, Anurag and Nisha over lunch. After that I decided to double-check my exploit code samples because it was the one session that I had never done before and it was something that is not my area of expertise. The only aspect of the exploits that I was really comfortable with was how they worked and how I could explain that using the usual tools like gdb, objdump, etc.
I was sitting in the speakers lounge cleaning up my examples when Aditya Patawari came in and asked me about my session. That reminded me that I had to actually go into the session :D We quickly left for the classroom and found pjp finishing up his python session, which had a packed audience. Once he was done, a lot of people left, which led me to think that even this talk was going to have a modest audience. However, people trickled in as I was about to begin and by the time I did begin, the room was full.
The exploits session was probably one of the best sessions I have done so far, mainly because I personally enjoyed it. The audience also consisted of people who were interested (exploits are sexy, as someone said later) and I got a lot of questions during and after the session. The talk also seemed to give some people from the audience the impression that I am a security expert, which is flattering but incorrect.
Then came the awesome part where Pai and Yogesh Babar followed up my session with impromptu sessions, which the audience lapped up eagerly as well. Pai talked about extensibility of postgresql by making it call routines in perl (typical dinosaur stuff ;) ) and Yogesh did a talk on kdump. I learnt later that Rahul Sundaram did something similar in one of the seminar halls by asking the audience to “ask him anything about Fedora and Open Source”. Pretty cool stuff.
After Pai and Yogesh were done, it was again time for me to get on to the platform for another session, this time on autotools. This was something I had done multiple times with the same examples, so it was pretty uneventful.
Day 2 was probably awaited by a lot of people for another reason -- the FUDPub! We went to Park Estique near Vimaan Nagar for dinner. There was loud music and bling bling lights and food and drink. I enjoyed the food and drink; the lights gave my a headache and the loud music was, well, too loud. In any case it was fun chatting with people and having the really good food.
Like the first day, I did not get to attend any other sessions, this time for a different reason. I’ll probably submit less sessions in the next conference so that I actually get to attend other sessions and meet and talk to more people. I did meet a lot of interesting people on day 2, so all of that hectic schedule was completely worth it.
FUDCon Pune 2011: Less than 48 hours to go!
Posted: Nov 02, 2011, 18:19Firstly,
and I’ve got a lot of things to do once I’m there!
FUDCon Pune 2011 has a wonderful line-up of talks and sessions going for it this weekend. I have added three workshops of my own too:
- Getting started with autotools: This is intended to be a hands-on session on autotools. I have done this session multiple times before and it has been quite successful with beginners looking to understand the GNU build system.
- Security exploits, Live!: This was originally Amit Shah’s idea, since he wanted someone to do a demo of buffer overflows and similar stuff. You don’t really have to be an aspiring security professional to attend this, since most of the ideas are based on basic programming fundamentals
- Apache Qpid messaging: “Enterprise Applications” are not typically things that you deal with when you’re in college. Quite often one ends up thinking of these things as inaccessible due to high costs and high perceived difficulty level. In this session, I intend to demonstrate how easy it is to write really complicated applications with mind-blowing ease.
The end result is pretty impressive and we’re expecting even more submissions to keep people busy. There were concerns on whether there were too many sessions running in parallel, but I don’t think that matters. The event attendance is expected to be quite large, so there will e enough audience to keep all speakers busy. Besides, I don’t know a single conference where one gets to attend each and every talk in the conference. The real fun of the conference is to meet friends, collaborators and fellow geeks and not just sitting in rooms and listening to people talk.
Oh, and we have hackfests throughout the day on Sunday. The one I am particularly psyched about is Kushal Das’ libgqpid. It was an idea we had brainstormed about earlier and he wrote a lot of the code in it. Hopefully we can get a release out on Sunday with the core qpid client features ready. For the uninitiated (which is pretty much everyone I guess, since Kushal has not published the code yet), libgqpid is a glib based C wrapper around the qpid C++ client.
Looking forward to having a great time at FUDCon!