Hell Oh Entropy!

Life, Code and everything in between

nullcon 2014

I have always had a peripheral interest in application security, so when some folks at work were discussing about attending nullCon (considered India’s premiere security conference), I decided to join them too. As usual, I submitted a talk because if selected, it pays for your attendance and it makes it easier to interact with more people.

I demoed and spoke about the recent pt_chown vulnerability in Linux and glibc, slides are here. Special thanks to Martin Carpenter for finding this vulnerability and later being available for advice and help when I was preparing for this talk. It was a fairly short talk (I had 20 minutes and I finished in about 15, including the one question) and it was the first one of the night talks, so I was quickly into attendee mode for the rest of the conference. There was an interesting talk on browser extension security in the night talks track, given by a student, Abhay Rana. It gave an overview of the javascript context model of Firefox and Chrome, and then went on to talk about the issue of extension writers asking for more permissions from the framework. Not exactly my primary interest (which is system software and applications security as opposed to web-based stuff), but was interesting nevertheless.

The main conference did not have a lot of things that interested me greatly, because heuristic analysis, penetration testing and fuzzing seemed like the primary focus and also the fact that there was little presented in the Free Software space, i.e. security research on Linux and/or BSD systems and software. I was even more disappointed when I found out that Chris Evans could not make it and was told that another Google engineer would give a replacement talk. Replacement talks are usually very high level, templated and not a lot of fun as a result, but I was in for a surprise. Sumit Gwalani talked about Chrome browser and OS security and for me that was the best talk of the conference. I had a very useful chat with Sumit later about some aspects of glibc and memory allocation tweaks that Chrome does.

Other than that, there were a number of hallway talks and discussions with attendees and speakers over interesting topics like reversing programs, binary patching and malware unpacking. The Bogmallo beach was probably the most beautiful Goan beach I have been to till date, with friendly people and great food. The Bogmallo beach resort is good, but overpriced a bit.

Comments

GNU C Library 2.19 and what developers can get from it

The GNU C Library project released version 2.19 of it’s library on Saturday (Friday for some), with Allan McRae as the release manager. Apart from numerous bug fixes, there are a couple improvements that would interest developers. Both improvements are related in some manner to the library documentation, which apparently is not very well known. In fact, it seems that very few people know that the official documentation for functions in the GNU C library is not the man pages, it is the GNU C Library Manual. This is not to discredit the man page project in any way of course - Michael Kerrisk does a swell job of keeping man pages in sync with glibc behaviour wherever necessary and I’d like to think that we’re cooperating to the best of our abilities. The man page project however is more general and covers a fairly broad set of components including the kernel, the C library and some tools. The glibc manual focusses specifically on functionality provided by glibc.

Now for the first big improvement. Alexandre Oliva, along with a number of reviewers did the mammoth job of adding documentation regarding multi-thread safety, async-signal safety and async-cancellation safety for functions provided by glibc. This is an invaluable resource because it tries to describe precisely what kind of guarantees the glibc implementation of various functions provides, as opposed to the guarantees documented in the various standards.

The second improvement is the introduction of Systemtap userspace probe markers for various events in malloc and some mathematical functions. The malloc subsystem is fairly complex and has some critical events that a developer may want to track when trying to profile memory allocation patterns for their programs. These probes are placed at such critical points in the malloc subsystem so that one may write up a systemtap script to profile their applications much more easily. I had written a description of the malloc internal implementation some time ago, which is still relevant and may help developers select the appropriate probes.

Some mathematical functions try to provide a guarantee of accuracy of the result to the last bit and to do so, some inputs may require multiple precision computation (I have of course, written about this in a bit more detail in the past). This fallback computation may be in multiple stages and may take anywhere between a 100 times to a 1000 times more than the normal execution speed of the function. While this fallback is needed only for a handful of inputs in the entire range, the performance impact is really high when an application does hit this path. So to help developers identify whether their performance hit is due to these multiple precision fallback paths, these paths have been marked with probe markers that can be used in systemtap scripts to profile applications. The probes have been documented in the libc manual, in the Internal Probes section.

Finally, I have managed to finish producing a significant set of benchmark inputs for the math functions I care about, so it might be a good time for folks to start trying them out and sending in results. The README in the benchtests directory should be a good starting point. The output file format is still not final - I’m toying with JSON as the final format - so expect changes there in future. The string benchmarks still need some attention, which hopefully will happen in the 2.20 time frame.

Looking forward to 2.20, Joseph Myers has already begun the work of moving the architectures in the ports directory to the regular source tree. Once this is complete, we will have no concept of secondary architectures, which is a good thing. Hopefully in future we will also get rid of the libc-ports mailing list, which I have complained about in the past as being an unnecessary separation.

On the benchmarks front, we’ll be moving to python as the language of choice for the scripts and adding features such as graphing, a better file format, scripts to compare benchmark outputs and anything else that catches my fancy during the next 6 months.

Finally, I’ve mentioned this before - The GNU C Library manual needs contributors and here’s how you can help.

Comments

Blank lines in /etc/netgroup

While doing a code audit, I noticed that netgroups queries resulted in warnings in valgrind on my system:

==14597== Invalid read of size 1
==14597==    at 0xBB735E0: _nss_files_setnetgrent (files-netgrp.c:106)
==14597==    by 0x4F4954E: __internal_setnetgrent_reuse (getnetgrent_r.c:139)
==14597==    by 0x4F49879: setnetgrent (getnetgrent_r.c:181)
==14597==    by 0x4033EA: netgroup_keys (getent.c:493)
==14597==    by 0x402370: main (getent.c:1011)
==14597==  Address 0x51fe63f is 1 bytes before a block of size 120 alloc'd
==14597==    at 0x4C2A45D: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==14597==    by 0x4EA403D: getdelim (iogetdelim.c:66)
==14597==    by 0xBB73575: _nss_files_setnetgrent (stdio.h:117)
==14597==    by 0x4F4954E: __internal_setnetgrent_reuse (getnetgrent_r.c:139)
==14597==    by 0x4F49879: setnetgrent (getnetgrent_r.c:181)
==14597==    by 0x4033EA: netgroup_keys (getent.c:493)
==14597==    by 0x402370: main (getent.c:1011)

The code was very obviously buggy too:

         ...
         while (line[curlen - 1] == '\n' && line[curlen - 2] == '\\')
           {
             /* Yes, we have a continuation line.  */
             if (found)
             ...

So if line has just the newline character, curlen will be 1 and hence, line[curlen - 2] will be the byte before the array. This kind of thing is never caught normally because of two factors:

  1. The byte before line is part of the malloc metadata
  2. The byte is read and not written to, hence there's no question of corrupting data and triggering one of malloc's consistency checking mechanisms.

Hence the code never crashes and we think that everything is just fine until the byte preceding line just happens to be 0x5b, i.e. ‘\‘. That will again never happen in the normal case since it would mean that the size of the line is at least 0x5b00000000000000 on a 64-bit system. However, even if this happens, it may not do a lot of harm since all it does is resulting in concatenating an empty string to the subsequent netgroup line.

However, consider a scenario where a netgroups file was generated in a kickstart like this:

{ for i in foo bar baz; do
    echo "${i}_netgroup \\"
    for j in $(seq 1 10); do
        echo "($i, user_$j,) \\"
    done
    # Guess what! I can get away with the \ in the end if I leave a blank line!
    # How Cool is that!
    echo ""
done } > /etc/netgroup

Here, the administrator has found a ‘feature’ where leaving blank lines allows them to get away with their very simple script to generate netgroups with trailing backslashes. Now what happens if line happens to have a preceding 0x5b? The groups separated by the blank lines will get concatenated and all of a sudden you have one group with all of the entries being members of the first group!

So if an attacker manages to control this preceding byte in a program (possibly through a different heap overflow bug), (s)he could manipulate netgroup memberships and cause a breach. So we have a security vulnerability on our hands! Or do we?

Not such a big deal after all

So while it is fun to imagine scenarios to exploit code (who knows when you’ll get a hit!), this one seems like just a harmless buglet. Here’s why:

Controlling the preceding byte in such a manner is extremely difficult, since the malloc consistency checker should kick in before line is even allocated this offending chunk of memory. One may argue that malloc could be overridden and the preceding byte could be tailored to ones needs, but that immediately takes out the interesting attack scenarios, i.e. the setuid-root programs. If you’re doing authentication using a non-setuid command line program then you’ve got more serious issues anyway because one could simply override getnetgrent_r or similar user/group/netgroup browsing functions and bypass your check.

Finally, this ‘exploit’ depends on the netgroups file being wrong, which is again extremely unlikely and even if it is, that’s a problem with the configuration and not a vulnerability in the code.

In the end it is just a trivial off-by-one bug which must be fixed, so it was.

Comments

Corporate Email with ActiveSync on KitKat (Nexus 5)

Update: I have now updated the post with a solution that actually works. There is a disclaimer though: future system updates will not update your Email and Exchange packages; you’ll have to always do it manually.

I once had a Samsung Galaxy S (GT-i9000). It was a beautiful phone that I used, rooted and installed Cyanogenmod on (right up to JellyBean) and thoroughly enjoyed. And then I decided that I wanted to fix the GPS contact on the phone using a documented soldering solution. So that’s how I got to order a Nexus 5 :D

Jokes aside, I was thrilled to get my hands on a Nexus 5 last week. My Galaxy S had served me well for many years mainly due to great quality hardware back then and I chose the Nexus 5 for the same reason - a quad core processor and 2GB RAM will surely last me a while. There was one problem though and as I found out, it was a fairly widespread problem - corporate emails that use ActiveSync would not work. The problem was documented and also fixed, but there was no fix available for the Nexus 5 yet, unless you had Cyanogenmod.

I had decided to not root my phone for at least six months and even after rooting, not install Cyanogenmod for another six months, so my chances of getting a fix were dependent on Google releasing an update. This looked like a good chance to get my hands on some android patching and building, so I decided to give it a go. To summarize, I did the following:

  • Rooted my phone
  • Locate the fix
  • Built the new apks from the AOSP source
  • Removed the Email apks
  • Installed the new apks

My phone is alive and syncing emails, so here’s a more detailed description of what I did. I am not going to write about rooting the phone or setting up the android development environment. That stuff is well documented - just make sure you get the binary images from reliable sources, like the original website.

Locate the fix

Comment #174 in the bug report pointed to the Cyanogenmod patch review system which had the patch that fixed the problem. The Google engineers unfortunately were not helpful enough to make any such note. I verified in the downloaded AOSP source that these changes were in place. I just stuck to master and did not bother going for any specific branches to do a backport because (1) it’s java code, so it ought to be largely device independent, i.e. it shouldn’t break anything on my phone if it’s wrong and (2) I’d expect that it should be possible for the apps to be independently built and installed, which they were. Since the code on master and building with java 1.7 did not work, I installed the 1.6 jdk and checked out the android-4.2.2_r1 branch in AOSP. In fact, the code on 4.2.2_r1 branch will not even build with java 1.7. On the code end, the CyanogenMod change was slightly different, but the problem was in fact fixed on the 4.2.2_r1 branch with this revision:

commit d92a75c707461188e8743149476e8f49ef191b42
Author: Tony Mantler <email removed>
Date:   Fri Nov 15 12:45:53 2013 -0800

    Make sure the client certificate is always installed
    
    b/11678638
    
    Change-Id: Iafe200d14b72678324758fe08b03c8ea7bb9dc5c

So there was no need to actually patch anything.

Build the packages

Building the individual packages is very simple:

$ make showcommands Email Exchange2

Defining EXPERIMENTAL_USE_JAVA7_OPENJDK allowed me to use OpenJDK to build java programs instead of the proprietary Oracle Java. showcommands is an additional target that gives a verbose mode. Email and Exchange2 are the package names. One could get those package names by looking for the LOCAL_PACKAGE_NAME variable in the Android.mk in the package directory ($(srcdir)/packages/apps/).

Remove the old packages

Before removing, always backup. It’s easy to do this from adb, using the commands:

$ adb pull /system/app/EmailGoogle.apk backup/
$ adb pull /system/app/Exchange2Google.apl backup/

Once this is done, get into adb shell and get root:

$ adb shell
phone:/$ su

You’ll get a prompt on your phone confirming root access, which you need to allow. Now remount the /system filesystem in read-write mode using the following command:

phone:/# mount -o remount,rw /system

Once the remount succeeds, remove the package files using the commands below. Also remove the odex files since they’ll be regenerated for the new packages:

phone:/# rm /system/app/EmailGoogle.apk
phone:/# rm /system/app/EmailGoogle.odex
phone:/# rm /system/app/Exchange2Google.apk
phone:/# rm /system/app/Exchange2Google.odex

Install new packages

Installing the newly built packages is also just as simple. First, copy the packages to your sdcard:

$ adb push out/target/product/generic/system/app/Exchange2.apk /sdcard/
$ adb push out/target/product/generic/system/app/Email.apk /sdcard/

and then copy those packages to /system/app/ and give them appropriate permissions:

$ adb shell
phone:/$ su
phone:/# cp /sdcard/Email.apk /system/app/
phone:/# cp /sdcard/Exchange2.apk /system/app/
phone:/# chmod 644 /system/app/Email.apk
phone:/# chmod 644 /system/app/Exchange2.apk

Reboot your phone and let the phone optimize your app (i.e. generate the odex file) for your phone. ActiveSync should now start working on your phone!

Comments

Yerwada, birding in the heart of the city

There’s something exciting about finding a secret spot in the heart of a busy city. Pune is not quite Mumbai in terms of being busy, but it’s not very far. So finding a silent spot right in the middle is quite amazing.

The mention of Yerwada would bring images of dusty wide roads into my mind - the mental hospital and jail don’t figure since I have never seen them. So I was initially a bit surprised when I saw mention of Yerwada as a birding spot in Pune. Looking at the map, it seemed obvious - the Mula-Mutha river flows right next to it, on to Kalyani Nagar. Since it’s not more than a 20-minute bike ride from my place, I decided to go there on a weekday morning.

Finding the birding spot is not very difficult. Look for the crematorium in Yerwada and drive there. You cross the Yerwada garbage collection depot on the left before you see a narrow road go down towards the river on the left. There’s a board pointing at the Yerwada Smashan Bhumi. Take that road, but don’t go right down to the crematorium. Just before the crematorium there’s a narrow road on the left alongside the garbage depot. The road quickly becomes a narrow trail big enough for just a bike or for walking. There, you’ve entered the birding zone!

The area is quite dry, due to which the sun penetrates through rather early. This gives sufficient light to capture inland birds. One could find a trail and walk down to the bank of the river too. There are plenty of spaces to park yourself and let the birds come closer. It’s quite dirty though, with bones and skeleton of small animals strewn about.

I had a couple of hours to explore as much as I could before going home to go to work. Here are some pictures for your viewing pleasure.

Comments

Watching Birds

So this is my first non-technical post since I migrated all my stuff over to siddhesh.in. It’s not that there’s nothing interesting happening in my life - it’s just that I never felt the urge to write for a very long time. Now I do and I hope it continues for some time.

In the last few years, I developed a new hobby of photographing birds. I would collect bird feathers as a child with Milind and we would try to spot birds, without much success. In fact, without any success. I remember this one time we saw a large bird fly down from a cliff at National Park - we were on top of the cliff. We imagined that it was a rare golden bird - well, I was about 12 and Milind was 14. In hindsight, I guess the bird may have been a Black Kite with the sun right on it, but I’m going to stick to the description of a golden rare bird to preserve the memory of those days.

It’s not easy to spot birds, but all it takes though is one good birding trip and all of a sudden you see birds all over the place - out the office window, outside your home, while you’re walking, while you’re driving - everywhere. It’s not that they suddenly appeared - they were always there. It’s just that your eyes open up to this amazing new world, which is really not new at all.

So despite being able to see birds everywhere, why do we go birding to specific places? I guess the big reason is to get away from other humans and be with the birds - just the birds and you. It’s like they do all those special things just for you. I know it’s not true - that they really don’t even care about me unless I try to get too close - but it feels good to imagine that they like your company as much as you theirs.

It’s funny how this post has developed - I was going to write about a specific birding spot that doesn’t seem to be very popular and yet is in the heart of Pune. I think that will have to wait for another post…

Comments

NSCD Database File Layout

I had to do some analysis on the nscd database file this week, for which I hacked up a quick program to dump the contents of an nscd database file. I intend to post the code on the upstream mailing list when it is complete, but for now I wrote up a description of the file layout on the glibc wiki since there isn’t any documentation around it.

Comments

Update on libm performance

If you’ve been following glibc development (I don’t think many do; the kernel is sexier apparently) you’ll see that a new NEWS item is in for 2.18:

* Improved worst case performance of libm functions with double inputs and
  output.

If you’re wondering if this has anything to do with my previous few posts on libm, then you’re right, it does. Thanks to a number of very interesting changes, the speed of the pow function on x86_64 is now roughly 5 times faster in the worst case than in 2.17. I have considered the pow function throughout my work because it is probably the most ill-reputed function implementation. I plan to write up a detailed description of various improvements I made to the code (other than formatting it and fixing the value of TWO) in a separate post or series of posts. To summarize, I have saved time by:

  • Avoiding wasting time multiplying zeroes in the multiplication function
  • Written a fast squaring method that is a special case of generic multiplication
  • Faster polynomial evaluation for multiple precision exp function
  • Configurable mantissa type for multiple precision numbers, to allow integral mantissa for x86 and retain the floating point mantissa for powerpc
  • Tweak to the multiplication algorithm to reduce multiplcations
  • Dozens of minor tweaks to eek out performance

But the worst case is a few thousand times slower than the average case; 5x is nothing!

Yes and no. 5x is indeed not a good enough improvement if one is looking to compare with the average case, but for an implementation that tries to guarantee 0.5 ulp correctness, it is quite impressive. The comparison point for that is multiple precision implementations like mpfr and we’re not far off that target. Anyone who has done some work on math library implementations will tell you how it is currently not possible to predict worst case precision required to guarantee accuracy in bivariate functions like pow, as a result of which one has to descend into large precisions whenever necessary.

I don't care about exactness, give me something fast and reasonably accurate

I got this a lot from people while working on this and honestly, it’s more a question of project goals than anything else. Currently we’re keeping things as they are, i.e. we’re going to try and maintain our halfulp correctness and try to speed things up. Maybe in future we could think of having different variants of implementations that have different performance and accuracy characteristics, but there’s nothing like that on the table right now.

Is this it? Will there be more?

There’s still a couple of interesting changes pending, the most important of them being the limitation of worst case precision for exp and log functions, based on the results of the paper Worst Cases for Correct Rounding of the Elementary Functions in Double Precision. I still have to prove that those results apply to the glibc multiple precision bits.

After that there is still a fair bit of scope for improvement, but before that I plan to get the performance benchmarking bits working for at least the major functions in glibc. That will give a standard way to measure performance across architectures and also track it across releases or milestones.

And now for the Call for Contributions!

And now on to what we need help with. Glibc exports a lot of functions and it is nearly impossible for me to write benchmark tests for all these functions in the 2.18 timeframe. I guess we’ll be happy to go with whatever we get, but if you’re looking for some interesting work, adding a function to the benchmark could be it. benchtests/Makefile has instructions on how one could add new benchmarks for functionms to the testsuite and I’d be more than happy to help with any queries anyone may have while doing this - all you have to do is post your query on the libc-help mailing list (libc-help at sourceware dot org).

The benchmark framework itself could use some love. The current implementation is simply based on clock_gettime, which is at best a vDSO function and at worst a system call. It would be really cool to have architecture-specific overrides that do measurements with little or no overhead so that the measurements are as accurate as they possibly can be.

Comments

The glibc manual needs volunteers

The GNU C Library (glibc) needs more contributors to help improve the library. This is even more true for a very key portion of the library, which is the documentation. We have a glibc manual that gets updated on every release, which is a bit incomplete and has a fair number of bugs filed against it. We would welcome volunteers to take a crack at those bug reports and send in patches. Here’s some information to get you started.

The glibc manual code is maintained within the glibc source tree, in the manual directory. It is in texinfo format, so if you’re familiar with tex, you’re halfway through. The various chapters in the manual are in *.texi files, with a Makefile to build the manual in either info or html format.

To build the manual, create a directory within the source tree (or anywhere outside it is also fine), which I will refer to as the build directory from now on. This keeps the generated content separate from the code. Enter that directory and type the command:

$SRCDIR/configure --prefix=/usr

where $SRCDIR is the path to the sources. Don’t worry about the /usr prefix since we’re not going to install anything. If you’re especially worried about it, then use some other prefix like $HOME/sandbox or similar. Once configure succeeds, build the info format documentation using:

make info

or the html format using:

make html

The documentation gets built in the manual directory within the build directory. The html documentation is built in the libc directory within the manual directory. You can open index.html in a browser to browse through and verify your changes.

Contributing to glibc usually requires a copyright assignment to FSF. If you don’t mind doing this, the procedure is fairly easy, albeit time consuming. All you have to do is post your patch on the libc-alpha (libc-alpha AT sourceware dot org) mailing list and if copyright assignment is necessary for the change, the reviewer will let you know what to do. For help when writing a patch, the libc-help (libc-help AT sourceware dot org) mailing list is the place to go.

Comments

Multiprecision arithmetic in libm

Before my two week break, I was working on a bunch of ideas to try and get the multiprecision arithmetic performance in libm to not suck as badly as it currently does. There was a lot of stuff going on, so I’ll try to summarize them here. The primary reason for this post is to get myself back in groove (although I’ve tried to provide as muchh background as possible), so I apologize to readers if some of the content is not coherent. Feel free to point them out in the comments and I’ll try to clarify.

The multiprecision bits in libm are essentially all files starting with mp in $srcdir/sysdeps/iee754/dbl-64/. The structure that stores a multiprecision number is called mp_no and is declared in mpa.h as:

typedef struct
{
  int e;
  double d[40];
} mp_no;

where e is the exponent of the number and the mantissa digits are in d. The radix of the number is 224, so each digit in d is always a non-negative integral value less than 224.

The other all-important module is mpa.c, which defines basic operations on mp_no (construct from double, deconstruct to double, add, subtract, multiply, divide). It was relatively easy to see that these basic operations were the hotspots (specifically multiplication), but not so easy to see that Power code has its own copy of these functions. And there begins the real difficulty.

The Power architecture is unique in that it has all of 4 floating point units. The power specific code takes advantage of that fact and is tweaked in a manner that the execution units are used in parallel. In contrast, the intel x86 architecture is quite weak on floating point units and this is where the major conflict is. The mantissa of mp_no, which is currently a double (but does not need to be since it can only have integral values), is perfect for Power, but not good enough for intel, which has much faster fixed point computations. Conversion between doubles and ints is too slow on both platforms and is hence not a good enough alternative.

A possible approach is using a mantissa_t typedef that is then overridden by Power, but I need to do some consolidation in the rest of the code to ensure that the internal structure of mp_no is not exposed anywhere. So that’s a TODO.

Apart from exploiting architecture-specific traits, the other approach I considered was to tweak the algorithm for multiplication to make it as fast as possible in a platform-independent manner. A significant number of multiplication inputs are numbers that do not utilize the full precision of mp_no, i.e. a large number of mantissa digits are zeroes. The current algorithm blindly loops around the entire precision multiplying these zeroes, which is a waste. A better idea is to find the precision of the numbers and then run the multiply-add-carry loop only to the extent of that precision. The result of this was an improvement in performance to an extent of 36% in the pow function, i.e. a bit less than twice the speed of the earlier algorithm!

Next steps are to evaluate the impact of storing the actual precision of numbers so that it does not have to be computed within the multiplication function. That’s another TODO for me.

Finally there is a lot of cleanup in order in pretty much all of the mathlib code (i.e. the stuff in sysdeps/iee754/dbl-64), which I’ve been chugging away at and will continue to do so. I’m sure the glibc community will love to review any patch submissions that even simply reformat these files in the GNU format, so there’s a good way to get started with contributing code to glibc.

Comments