Hell Oh Entropy!

Life, Code and everything in between

_FORTIFY_SOURCE=3 performance

Posted: Jan 05, 2023, 16:28

So early last year I finished implemented everything needed for a fully working _FORTIFY_SOURCE=3 so that disrtributions can use it out of the box. OpenSUSE adopted it almost immediately and Gentoo started the work of adding it to their hardened profile. I proposed to make it the default for Fedora 38 after some tests but people quoted to me this blog post that some guy wrote, telling me that there’s a performance issue. Since my explanations and clarifications in the Fedora wiki or on the Fedora devel list is not sufficient (the feature was approved but the “_FORTIFY_SOURCE=3 has performance overhead” claims don’t seem to stop), here’s a blog post for a blog post, stating conclusively that the performance issue is theoretical and overstated, the guy didn’t know what he was talking about when he wrote it.

That guy is working on a clarification blog post of his own, describing in some more detail why the concern is overblown but he has to jump through editorial hoops of a multi-billion dollar corporation that pays his salary, so his apology to me is going to take a while. Whenever he gets to publish his work, I’ll link it here so that it’s two blog posts against one. Take that!

Comments

That is not a number, that is a freed object

Posted: Apr 20, 2022, 12:01

How many of you have written this kind of code in the past:

o = xmalloc (old_size);
...
n = xrealloc (o, new_size);

if (n != o)
  {
    o = n;
    /* Update other pointers that referred to o or offsets from it.  */
  }

Not uncommon right? We’re not dereferencing the freed o and the pointer is after all, a number and hence should be perfectly safe to check, right? And more optimal too since we’re not updating pointers if it’s not necessary. Well…

Better Fortification

TLDR; I broke this ‘safety’ in my implementation for __builtin_dynamic_object_size in gcc but I’m not wrong, you are! See the last section for why.

Now for those of you interested in the story, it all began with the implementation for __builtin_dynamic_object_size. This builtin was implemented first in clang and promised to be a better __builtin_object_size, which was severely limited by its necessity to emit a constant. That restriction meant that (1) there were many cases where it just couldn’t arrive at a constant size and (2) where it did, it would come up with an upper or lower estimate and not necessarily a precise size. Given that the builtin is primarily used to implement _FORTIFY_SOURCE (there’s a more detailed blog post describing its mechanism out there), this directly reduces the scope of this security protection.

__builtin_dynamic_object_size had deeper implications than just being a dynamic version of __builtin_object_size however, which had led to initial pushback in the gcc community. Now that the implementation is due to come out in gcc 12.1 and is being tested with distribution rebuilds, new and interesting implications are being discovered. One of these (and so far the most fascinating to me) was its impact on using (not dereferencing, mind you) a freed pointer.

How gcc deduces object sizes

The object size computation is largely (there are some caveats here but not important for the purposes of this post) done in a separate pass. The pass runs twice, once very early in the pass chain and finally, near the end of the tree passes. The early run is a hack that tries to record subobject size estimates before subsequent passes simplify subobject references to references to their parent object, thus returning a more precise subobject size. The late run is where the actual fun happens.

The object sizes pass, at a high level, tracks the pointer passed to either __builtin_object_size or __builtin_dynamic_object_size to all possible objects it may point to and subsequently, to the site of their assignment, to derive the size. In the static case (i.e. __builtin_object_size), it tries to come up with either the maximum or the minimum estimate while in the dynamic case it builds a fancy expression that would evaluate to the precise size at that point. Of course, ‘precise’ shouldn’t be taken for granted because there could be future changes that make the expressions imprecise in the interest of broadening coverage. If the pass is unable to deduce a size of any of the target objects of the pointer for any reason (passed through a call, non-constant in the static case, etc.), the call is replaced with (size_t) -1 or (size_t) 0 as appropriate.

I can’t judge what I can’t see

As the pass tracks origins of the pointer in question, it unfortunately does not take into account any uses between the allocation and the reference in the builtin that may alter the nature of the pointer. This means that if the pointer was reallocated between its first allocation and the builtin call, the pass won’t notice unless the pointer was explicitly updated. This is a benign limitation in the static case because for the above example, it would simply compute the maximum of new_size and old_size and return the result. In fact in most real world cases since the reallocation is bound to be dynamic, it would simply bail out, resulting in a missed fortification.

With dynamic sizes though, one will now get the new size for n != o but not for the n == o case. As a result, any fortified function call based on this information will see the old size and abort fearing a buffer overflow even though there technically wasn’t any. This was seen in autogen, which had this precise pattern and hence stumbled when it was built with _FORTIFY_SOURCE=3.

It’s a bug, it’s not a bug…

After a bit of back and forth, Martin Liška very helpfully came up with a contained reproducer that allowed us to see what had actually happened. I had broken a pretty common idiom, which meant that those applications would have false positive aborts, something that hadn’t happened with _FORTIFY_SOURCE before. That is until I found an excuse that I could use to point the finger back at you (which includes past me, who is clearly a different person, no?), the developer!

Object Lifetimes

clang 13 also broke with the test case Martin shared after I altered it a bit to fortify fread. That gave me first relief because clearly whatever I did wrong, the smart folks in the clang community did wrong too. So I wasn’t that stupid. Then of course, there was this, which put our collective ‘stupidity’ into perspective, kinda letting us off the hook:

Section 6.2.4 of the ISO C standard (I’m referring to an April 2011 draft because who even in their right minds pays for their copy?!) has this in point number 2:

The lifetime of an object is the portion of program execution during which storage is
guaranteed to be reserved for it. An object exists, has a constant address, and retains
its last-stored value throughout its lifetime. If an object is referred to outside of its
lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when
the object it points to (or just past) reaches the end of its lifetime.

It clearly states that even the value of the pointer pointing to the object is not reliable after it has been freed, so not only should one avoid dereferencing the pointer after it is freed, they should refrain from using it altogether.

Essentially, the comparison with the old pointer results in undefined behaviour. I don’t think the standards committee intended to invalidate this specific idiom with that language, but it does allow compilers the freedom to make assumptions about pointer validity and this idiom ends up trouncing on it. It is possible for the compiler to look for a dominating realloc and update its expectations for size in very specific cases, but it still remains largely unsupported. It won’t, for example, work in cases where a reallocation has been wrapped in a function without any malloc attribute annotations. In fact, gcc 12 has a new -Wuse-after-free option that warns users of this that I, admittedly, once thought was too harsh.

EDIT 2022-04-21: This spawned a conversation in the rust community and Ralf Jung pointed out a way to think about this in pointer provenance terms and does not rely on the above C standard indeterminate pointer clause. This is very relevant because what the object size pass does in this context is essentially pointer provenance (albeit limited and somewhat incomplete), which makes it natural for it to trip on this implicit assumption of o == n. Continuing to use o (and any pointers derived from it) in this context is incorrect.

Getting better together

I’m going to try and support some of these simple cases in gcc during the gcc 13 cycle but in general, this is undefined behaviour. If your code uses this idiom, you should start weaning away from it if it’s not performance sensitive and unconditionally update pointers once their lifetime ends.

Deploying _FORTIFY_SOURCE=3 more widely has been a learning experience (all owing to Martin Liška’s efforts since he was the one building thousands of packages and reporting bugs!) in the deeper implications that __builtin_dynamic_object_size would have when replacing __builtin_object_size. Another interesting implication was the misuse of malloc_usable_size and equivalent interfaces that we discovered with systemd and jemalloc that open up deeper design questions for malloc interfaces. More on that in a separate post either here or on one of the Red Hat blogs.

A simple change of more precise object sizes and wider coverage ended up not only weeding out actual overflows, but also some interesting corner cases and “adventurous” programming practices. I’m going to start rolling some of this out into Fedora near the end of the year and we’ll hopefully have better mitigations in Linux distributions very soon.

Comments

Fedora Activity Day at Pune: Towards a more secure Fedora

Posted: Nov 02, 2014, 07:10

Huzaifa had wanted to do a Security FAD in Pune for a while to tackle the really high number of open security bugs in Fedora. We had initially set a date for September but we pushed it forward since Huzaifa was not available. In the end, Huzaifa was not not available even on the rescheduled date, so PJP took over ownership of the event.

I wasn’t expecting a lot of people to attend given the nature of the activity and as it turned out, there were 14 signups with 7 showing up finally. We also had a few people joining remotely, which was awesome. We also had a Docker event running in parallel at the venue (the Red Hat Pune office), so we had more company at lunch.

Everyone barring PJP came in on India Standard Time, i.e. late by a few minutes to an hour or so. We started a bit late as a result, with a quick introduction to security in Fedora by PJP. After the talk and questions we didn’t waste any time and quickly got down to triaging security bugs. Our plan of action was to take ownership (by setting fst_owner= in the bugzilla whiteboard) of security bugs we understand and start working on driving them to conclusion. What this implied was that we would have to follow up after the FAD to ensure that the bugs were closed.

I started from the oldest bugs (dating back to 2011!) and managed to own 8 bugs by the end of the day. We had many a spirited discussion over what constituted a security bug (most of us understood OS security to a fair extent, but were not security experts) and my impression was that all of us went home a bit wiser. I learned that xen is a horrible horrible package - it bundles a bazillion projects into itself, due to which fixing flaws in the original project is not sufficient and xen would need to be checked and fixed separately.

Overall we had a pretty good day where 36 bugs got new owners - we managed to reduce the total backlog (of unowned bugs) from 370 to 334. Hopefully some of us will continue to work in our spare time (I know I’ll try) and bring that backlog down further.

Comments

Blank lines in /etc/netgroup

Posted: Jan 27, 2014, 11:36

While doing a code audit, I noticed that netgroups queries resulted in warnings in valgrind on my system:

==14597== Invalid read of size 1
==14597==    at 0xBB735E0: _nss_files_setnetgrent (files-netgrp.c:106)
==14597==    by 0x4F4954E: __internal_setnetgrent_reuse (getnetgrent_r.c:139)
==14597==    by 0x4F49879: setnetgrent (getnetgrent_r.c:181)
==14597==    by 0x4033EA: netgroup_keys (getent.c:493)
==14597==    by 0x402370: main (getent.c:1011)
==14597==  Address 0x51fe63f is 1 bytes before a block of size 120 alloc'd
==14597==    at 0x4C2A45D: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==14597==    by 0x4EA403D: getdelim (iogetdelim.c:66)
==14597==    by 0xBB73575: _nss_files_setnetgrent (stdio.h:117)
==14597==    by 0x4F4954E: __internal_setnetgrent_reuse (getnetgrent_r.c:139)
==14597==    by 0x4F49879: setnetgrent (getnetgrent_r.c:181)
==14597==    by 0x4033EA: netgroup_keys (getent.c:493)
==14597==    by 0x402370: main (getent.c:1011)

The code was very obviously buggy too:

         ...
         while (line[curlen - 1] == '\n' && line[curlen - 2] == '\\')
           {
             /* Yes, we have a continuation line.  */
             if (found)
             ...

So if line has just the newline character, curlen will be 1 and hence, line[curlen - 2] will be the byte before the array. This kind of thing is never caught normally because of two factors:

  1. The byte before line is part of the malloc metadata
  2. The byte is read and not written to, hence there's no question of corrupting data and triggering one of malloc's consistency checking mechanisms.

Hence the code never crashes and we think that everything is just fine until the byte preceding line just happens to be 0x5b, i.e. ‘\‘. That will again never happen in the normal case since it would mean that the size of the line is at least 0x5b00000000000000 on a 64-bit system. However, even if this happens, it may not do a lot of harm since all it does is resulting in concatenating an empty string to the subsequent netgroup line.

However, consider a scenario where a netgroups file was generated in a kickstart like this:

{ for i in foo bar baz; do
    echo "${i}_netgroup \\"
    for j in $(seq 1 10); do
        echo "($i, user_$j,) \\"
    done
    # Guess what! I can get away with the \ in the end if I leave a blank line!
    # How Cool is that!
    echo ""
done } > /etc/netgroup

Here, the administrator has found a ‘feature’ where leaving blank lines allows them to get away with their very simple script to generate netgroups with trailing backslashes. Now what happens if line happens to have a preceding 0x5b? The groups separated by the blank lines will get concatenated and all of a sudden you have one group with all of the entries being members of the first group!

So if an attacker manages to control this preceding byte in a program (possibly through a different heap overflow bug), (s)he could manipulate netgroup memberships and cause a breach. So we have a security vulnerability on our hands! Or do we?

Not such a big deal after all

So while it is fun to imagine scenarios to exploit code (who knows when you’ll get a hit!), this one seems like just a harmless buglet. Here’s why:

Controlling the preceding byte in such a manner is extremely difficult, since the malloc consistency checker should kick in before line is even allocated this offending chunk of memory. One may argue that malloc could be overridden and the preceding byte could be tailored to ones needs, but that immediately takes out the interesting attack scenarios, i.e. the setuid-root programs. If you’re doing authentication using a non-setuid command line program then you’ve got more serious issues anyway because one could simply override getnetgrent_r or similar user/group/netgroup browsing functions and bypass your check.

Finally, this ‘exploit’ depends on the netgroups file being wrong, which is again extremely unlikely and even if it is, that’s a problem with the configuration and not a vulnerability in the code.

In the end it is just a trivial off-by-one bug which must be fixed, so it was.

Comments

FUDCon Notes on my Security Exploits Session

Posted: Nov 10, 2011, 10:17

I was asked  by a couple of people for notes on the security exploits session that I conducted at FUDCon. I had posted the code samples on the talk page, but that is probably a little terse, so here’s a little write-up to support the code samples. To repeat what I had said multiple times earlier; I am not a security researcher, not even a security freak. This topic was suggested to me by Amit Shah, and I developed an interest in it due to my original interest, which is operating systems tools. The preparation of this talk got me interested in security, but only through the perspective of operating systems tools and programs, so I am still relatively indifferent to the subject of web-based security.

I started preparing for the session fairly late; i.e. 2 days before FUDCon. I am a little familiar with glibc code and with the way the compiler, linker, loader, etc. work on Linux, so that helped me understand a lot of the concepts behind exploits fairly easily. But concepts != working code and getting exploit code to work was the real challenge, especially when I had just about 3 evenings+nights for it. I had started with an idea of showing stack smashing and privilege escalation examples, but given the time constraint, audience level (college students) and also the constraint of my knowledge, I decided to restrict it to stack based attacks. All of the examples have a buffer which is being written to without checking for bounds of that buffer, typically with an strcpy.

The shellcode sample:

The shellcode sample as well as the final vulnerability demo (smash.c and vulnerable.c) were derived from the article Stack Smashing for Fun and Profit. That is a great article that explains in much more detail than I went into in my session, as to how the shellcode exploit can be developed.The core idea of this is:

The exploit is fairly straightforward, except that the instructions no longer work as is on Linux. These instructions require that the process image is set up in a manner that the page mapped to implement the program stack should have execute permissions. By default on recent Linux distributions (I tried this on F-15, but I am certain this should be true for at least F-13, if not earlier), the linker writes out binaries in a manner that the stack, when set up for a process, only has read and write permissions.

I spent a lot of time trying to figure out where this was set and finally found the -z option of the linker. So to write out a binary that sets up an executable stack, I had to call the linker with -z execstack. This finally enabled me to get the shellcode working.

The actual exploit

Once the shellcode was done, I could get the final vulnerability working and I immediately set about trying it. The exploit is based on the above shellcode example, except for one difference. The shellcode example is just that, an example. It is not an actual exploit; it is just a roundabout way to get a shell. The exploit I was about to do was a real crack. The idea now is to accept a string as input, which is then fed in to make a regular and buggy program provide you with a shell.

To imagine how this would work, think of the program that gives you a login prompt. In the context of this exploit, you should be able to input a crafted string into this login prompt and have it give you a shell without actually knowing the password! This is what the actual exploit ought to look like.

Again, writing the exploit was the easy part; getting it to run was quite another thing altogether. The exploit works as follows:

In all of this, there is one assumption that caused the program sample to not work; the assumption that memory maps are at predictable addresses. Recent kernels (quite some time ago actually) have a new security mechanism called Address Space Randomization which ensures that memory pages are loaded at random offsets. This meant that our educated guess would no longer work. So to be able to actually do this demo, I would have to disable address space randomization. I do that with:

echo 0 > /proc/sys/kernel/randomize_va_space

Even with this, my example would not execute by itself and would end with a SIGILL. I suspect this has something to do with the fact that my systm is x86_64 while the samples are all 32-bit. Our overflow string does not seem to agree with the instruction set on my system. In any case, it seems to run just fine inside a debugger. So if you run smash to get a shell, run gdb vulnerable and then run it with $EGG, you get the shell! At least I had a demo now.

Jump to libc

While I was trying out the shellcode example, I continued thinking about various other ways in which I could get a shell. One of the methods I thought of was to overwrite the return address with the address of the system() glibc call and pass the string via stack. I later found out via Huzaifa that this is in fact a documented way to exploit unchecked buffers on stack. Huzaifa also said that I may be missing out on something there and gave me some tips on finding the right resources for this. I still could not get this working, but at least I found out why the exploit did not work.

This exploit seemed attractive to me because it does not require an executable stack. The instructions I want to execute are already there in memory. So I only have to overwrite the return address and continue writing “/bin/sh” on the stack. I first tried with x86_64 in this case, because I was going by my own idea at that time. I soon figured out that the system() function on x86_64 did not take function arguments from stack. It took the argument from the %rdi register. My devious plan had been foiled! I did not give up however and looked at the system() implementation on i686. This retained the old behaviour of popping arguments from the stack, so my exploit was still possible here.

Not. My code was correct, but every time I run the program, the address of system() had just 3 bytes set. So it would always look something like: 0x00aabbcc. This was bad news because this meant that I cannot continue writing the shell string into the stack (strcpy stops copying when it encounters a 0x0). This means that I can call system() (like I was able to on x86_64 too), but I cannot pass it an argument. After trying enough number of times, I concluded that this must be a security feature. This was backed up by the tip Huzaifa had shared with me to (ironically) get the exploit to work. This was perhaps the first documentation of a return to libc exploit by Solar Designer. In his explanation, Solar designer mentioned that a way to fix this would be to ensure a 0x00 in the address, which is precisely what is happening here.

This obviously does not deny the fact that such an exploit can be carried out if you want to call functions that do not have arguments. Think for example, of a function that executes a shell ;)

Conclusion

The last modify example was a simple little trick I wrote on the last day to demonstrate how buffer overflows work and how they can be used to alter program flow. That again is not an exploit at all. At most, it can be called… a buffer overflow ;)

I had even more fun preparing for this session than actually presenting it because it taught me a lot more than I could ever have done by just reading literature. I hope those who attended my session at FUDCon enjoyed the session too.

Comments

FUDCon Pune 2011 Day 2: Me, followed by lunch, followed by me, followed by me...

Posted: Nov 07, 2011, 03:12

The title pretty much summarizes what most of my day looked like on day 2 of FUDCon. Well, not exacly, but it comes quite close. I had three sessions lined up in a single day and I was worried that I might lose my voice by the end of it. Ankur Sinha had all of 4 talks in the single day, so I was definitely better off that him.

The day started with Harish Pillay’s keynote on the community architecture team. The turnout on day 2 was less than that on day 1, which was a little surprising. Most of them trickled later in the day, so it meant that a large number of the attendees in Harish’s talk were Red Hatters and the CoEP volunteers. We probably started a little too early for a Saturday.

Immediately following that was my session on qpid messaging. The attendance in the session was modest (about 8-10 people), but the best part was that they were very involved in the session and that made the session worthwhile. Mrugesh Karnik also joined the session mid-way and asked some really good questions that actually helped my session. We ended up doing a queue design for a fictitious stock trading system and I was able to show how the design could scale very easily with a qpid messaging broker in place. Unfortunately, most of the attendees did not have laptops, so I could not engage them in a hands-on session. In fact, that was my story of the day to a large extent. I had intended all my sessions to be hands-on, but most of it never really materialized because most of the audience did not have laptops.

After the qpid session, I spent some time chatting with Sankarshan, Mrugesh, Anurag and Nisha over lunch. After that I decided to double-check my exploit code samples because it was the one session that I had never done before and it was something that is not my area of expertise. The only aspect of the exploits that I was really comfortable with was how they worked and how I could explain that using the usual tools like gdb, objdump, etc.

I was sitting in the speakers lounge cleaning up my examples when Aditya Patawari came in and asked me about my session. That reminded me that I had to actually go into the session :D We quickly left for the classroom and found pjp finishing up his python session, which had a packed audience. Once he was done, a lot of people left, which led me to think that even this talk was going to have a modest audience. However, people trickled in as I was about to begin and by the time I did begin, the room was full.

The exploits session was probably one of the best sessions I have done so far, mainly because I personally enjoyed it. The audience also consisted of people who were interested (exploits are sexy, as someone said later) and I got a lot of questions during and after the session. The talk also seemed to give some people from the audience the impression that I am a security expert, which is flattering but incorrect.

Then came the awesome part where Pai and Yogesh Babar followed up my session with impromptu sessions, which the audience lapped up eagerly as well. Pai talked about extensibility of postgresql by making it call routines in perl (typical dinosaur stuff ;) ) and Yogesh did a talk on kdump. I learnt later that Rahul Sundaram did something similar in one of the seminar halls by asking the audience to “ask him anything about Fedora and Open Source”. Pretty cool stuff.

After Pai and Yogesh were done, it was again time for me to get on to the platform for another session, this time on autotools. This was something I had done multiple times with the same examples, so it was pretty uneventful.

Day 2 was probably awaited by a lot of people for another reason -- the FUDPub! We went to Park Estique near Vimaan Nagar for dinner. There was loud music and bling bling lights and food and drink. I enjoyed the food and drink; the lights gave my a headache and the loud music was, well, too loud. In any case it was fun chatting with people and having the really good food.

Like the first day, I did not get to attend any other sessions, this time for a different reason. I’ll probably submit less sessions in the next conference so that I actually get to attend other sessions and meet and talk to more people. I did meet a lot of interesting people on day 2, so all of that hectic schedule was completely worth it.

Comments

FUDCon Pune 2011: Less than 48 hours to go!

Posted: Nov 02, 2011, 18:19

Firstly,

and I’ve got a lot of things to do once I’m there!

FUDCon Pune 2011 has a wonderful line-up of talks and sessions going for it this weekend. I have added three workshops of my own too:

I finally got involved in preparations for FUDCon this week to try and help wherever needed. Rahul suggested I help out with the schedule arrangement, since I had some ideas on the layout of some sessions. We slugged it out for about 4 hours yesterday and another 3 hours today at the Red Hat office, trying to make sure we schedule talks in a manner that attendees can follow tracks (virtualization, security, web apps, embedded, etc.) and at the same time, have enough flexibility to add talks and sessions without having to shuffle things around all the time. Saleem has become a pro at copy-pasting across cells in spreadsheets as a result.

The end result is pretty impressive and we’re expecting even more submissions to keep people busy. There were concerns on whether there were too many sessions running in parallel, but I don’t think that matters. The event attendance is expected to be quite large, so there will e enough audience to keep all speakers busy. Besides, I don’t know a single conference where one gets to attend each and every talk in the conference. The real fun of the conference is to meet friends, collaborators and fellow geeks and not just sitting in rooms and listening to people talk.

Oh, and we have hackfests throughout the day on Sunday. The one I am particularly psyched about is Kushal Das’ libgqpid. It was an idea we had brainstormed about earlier and he wrote a lot of the code in it. Hopefully we can get a release out on Sunday with the core qpid client features ready. For the uninitiated (which is pretty much everyone I guess, since Kushal has not published the code yet), libgqpid is a glib based C wrapper around the qpid C++ client.

Looking forward to having a great time at FUDCon!

Comments