Hell Oh Entropy!

Life, Code and everything in between

Day 0, Fedora APAC Budget Planning FAD: Pre-Release Party and a lot of Tom Yam Soup!

These months have been very busy for me on the Fedora front and for a change, my involvement in Fedora has been very non-technical. After years of shying away from it, I finally became a Fedora Ambassador and have been involving myself in a lot more non-technical things like organizing events and attending meetings. One such meeting I was looking forward to recently was the face to face meeting of some APAC ambassadors to plan for the budget for FY16 at Phnom Penh, Cambodia. This series of posts is a report of the event as I saw it.

My travel itinerary for Camboda was fairly packed; I was to fly in on Friday night and then fly out early morning on Monday. I would have ideally liked to conduct some workshops around systems programming but it seems like the aversion to any kind of low level programming is even worse in Cambodia than in India. Maybe the situation will be different in future. However, my packed schedule also meant that I would have missed the Fedora 21 release party (F21 is not out yet, but the party was already planned and the organizers did not want to shift it or rename it) on Friday.

But Somvannda would make sure I didn’t, at least not all of it.

I reached the hotel room in Phnon Penh at about 9PM and immediately, Somvannda had arranged for me to be taken to the DAI office for the Fedora Release Party. The cake, talks and a lot of the food was over, but the people and drinks were still there. I got a very warm welcome from Somvannda, Nisa, Tuan, Izhar, Danishka and Sirko and they also introduced me to Greta, the host of the party. After quick introductions, we had a few informal discussions about what we were going to talk about over the next two days, but we were mostly just drinking and eating whatever was left. I nibbled away even though I had been stuffing myself with food (Tom Yam soup FTW!) at the Bangkok airport while I had waited for 5 hours for my connection to Phnom Penh; the dry pastries were amazing!

The only remaining member of the party was Alick and he was not expected till about midnight. All of us waited up till he arrived, said our hellos and then turned in for the night. The next two days were going to see a lot of action, but we didn’t know it then.

Comments

Professor Shonku

I have been using a static blogging tool Kushal wrote, called shonku for quite a while now. It works well for all of the basic functionality and for things that don’t work, you get to fix them ;)

Comments

Fedora Activity Day at Pune: Towards a more secure Fedora

Huzaifa had wanted to do a Security FAD in Pune for a while to tackle the really high number of open security bugs in Fedora. We had initially set a date for September but we pushed it forward since Huzaifa was not available. In the end, Huzaifa was not not available even on the rescheduled date, so PJP took over ownership of the event.

I wasn’t expecting a lot of people to attend given the nature of the activity and as it turned out, there were 14 signups with 7 showing up finally. We also had a few people joining remotely, which was awesome. We also had a Docker event running in parallel at the venue (the Red Hat Pune office), so we had more company at lunch.

Everyone barring PJP came in on India Standard Time, i.e. late by a few minutes to an hour or so. We started a bit late as a result, with a quick introduction to security in Fedora by PJP. After the talk and questions we didn’t waste any time and quickly got down to triaging security bugs. Our plan of action was to take ownership (by setting fst_owner= in the bugzilla whiteboard) of security bugs we understand and start working on driving them to conclusion. What this implied was that we would have to follow up after the FAD to ensure that the bugs were closed.

I started from the oldest bugs (dating back to 2011!) and managed to own 8 bugs by the end of the day. We had many a spirited discussion over what constituted a security bug (most of us understood OS security to a fair extent, but were not security experts) and my impression was that all of us went home a bit wiser. I learned that xen is a horrible horrible package - it bundles a bazillion projects into itself, due to which fixing flaws in the original project is not sufficient and xen would need to be checked and fixed separately.

Overall we had a pretty good day where 36 bugs got new owners - we managed to reduce the total backlog (of unowned bugs) from 370 to 334. Hopefully some of us will continue to work in our spare time (I know I’ll try) and bring that backlog down further.

Comments

Understanding malloc behaviour using Systemtap userspace probes

A blog post I wrote on Understanding malloc behaviour using Systemtap userspace probes on the Red Hat Developer Blog has now been published. I got a query about a follow-up post with example usage, which I hope to be able to work on soon-ish.

Comments

Buggy HLE, microcode updates and SIGILLs

Update: Disabling lock elision in glibc doesn’t seem to be sufficient. Either way, the Fedora kernel folks will have an update in place to update the microcode early by default so that both the kernel and the first instantiation of pthreads will see HLE disabled. So read the story as something interesting that we did but didn’t quite work. It was fun though…

Amit and I ran into an interesting problem today with his new Haswell process based system. A fully updated Fedora 21 alpha would fail during boot and fall into the maintainer shell. The systemd journal showed that systemd-udevd was crashing with a SIGILL, which seemed strange. The core dump revealed the problem:

(gdb) x/i $rip
=> 0x7f68b0b978ba <pthread_rwlock_rdlock+186>:  xbeginq 0x7f68b0b978c0 <pthread_rwlock_rdlock+192>

The xbeginq instruction is an HLE instruction, so the first thing that came to mind was the recent errata that Intel pushed out, effectively announcing that HLE was buggy and that they were going to disable it soon. We looked at /proc/cpuinfo expecting to find hle and rtm missing, but were even more confused to find that they were present.

After much tinkering about, Amit made a vague reference to microcode_ctl being able to change CPU microcode on the fly. It took a while to hit us, but we finally realized that we had found the culprit. microcode_ctl had been updated with the latest Intel microcode update. We initially thought that it ought to be a one-time problem since the microcode would be flashed into the cpu and later everything would work, but then we found out that the microcode needs to be flashed on every boot.

So the root cause was that the microcode would happen late enough that systemd was already up and had read the hle bit, thus enabling lock elision support in systemd. Also, since the kernel had already read in cpu capabilities, it also did not have the updated capabilities, due to which we continued seeing hle and rtm set in cpuinfo.

As a result, thanks to the microcode update, all haswell based F21 alpha systems are essentially unbootable. Carlos is now fixing this by disabling lock-elision completely in the glibc build. Work is in progress for rawhide, F21 and F20 as I write this, so the impact of this will hopefully be minimal. If you do run into this problem, all you have to do is dowwngrade the microcode_ctl package and pin it so that it doesn’t get updated till the glibc update becomes available.

Comments

NOTABUG in glibc

The glibc malloc implementation has a number of heap consistency checks in place to ensure that memory corruption bugs in programs are caught as early as possible and the program aborted to prevent misuse of the bug. Memory corruption through buffer overruns (or underruns) are often exploit vectors waiting to be ‘used’, which is why these consistency checks and aborts are necessary.

If the heap of a program has been found to be corrupted, the program is terminated with an error that usually looks something like this:

*** glibc detected *** ./foo: double free or corruption (!prev): 0x0000000001362010 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x78a96)[0x7f3df63aea96]
/lib64/libc.so.6(cfree+0x6c)[0x7f3df63b2d7c]
./foo[0x400e7c]
/lib64/libc.so.6(__libc_start_main+0xed)[0x7f3df635730d]
./foo[0x4008f9]
======= Memory map: ========

and when one looks at the core dump, the top of the call stack is all inside glibc:

Program terminated with signal 6, Aborted.
#0  0x00007fd0273b6925 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64    return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0  0x00007fd0273b6925 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007fd0273b8105 in abort () at abort.c:92
#2  0x00007fd0273f4837 in __libc_message (do_abort=2, fmt=0x7fd0274dcaa0 "n not possible due to RF-kill") at ../sysdeps/unix/sysv/linux/libc_fatal.c:198
#3  0x00007fd0273fa166 in malloc_printerr (action=3, str=0x7fd0274daa5e "/proc/self/maps", ptr=) at malloc.c:6332
#4  0x00007fd0273fdf9a in _int_malloc (av=0x7fd027713e80, bytes=) at malloc.c:4673

The common mistake one may make here is to assume that it is a glibc bug because the crash is ‘caused’ by glibc. That is the equivalent of killing the whistleblower. The crash is indeed caused by glibc, but the bug is not in glibc. glibc has only caught the bug after it has happened and halted execution of the program.

And if you think glibc is overstepping its bounds by halting the program, you could tell it to not abort by exporting the MALLOC_CHECK_ environment variable set to either 0 (completely silent) or 1 (prints the message on stderr). Of course, you have to be smoking something very exotic to do that instead of finding and fixing the bug.

Comments

And speaking of speaking...

For many years now I actively avoided speaking on non-technical topics, especially FOSS evangelism, mostly because I wanted to focus on my personal technical development. I’ve realized over time that I can do both together and that resulted in me volunteering for more non-technical tasks, including evangelizing FOSS in various gatherings.

It was with this change of heart that I accepted Amit Kale’s request to speak on Open Source at the MIT college technical festival (Teknothon) in Pune.

I wasn’t sure what the audience would be like and I wasn’t even sure how I would approach the topic. To top that I caught the flu from my sister on Friday, leading to me almost cancelling the talk. I didn’t cancel however because the students from MIT had called and emailed multiple times to make sure that I was able to come. And so I went today.

I reached an hour early and found a couple of people talking about (and later demonstrating) rooting of Android phones. I later found out that they were students. In fact, as it turns out, I was the only ‘external’ guest at this fest. The amazing thing about the session was that they seemed to be talking the language of Free Software developers, talking about the freedom to do whatever they wanted to with their device, hacking up modules for their devices and so on. It was really refreshing to hear that.

My talk started with a flower boquet and a gift, which is a bit embarrassing, but well-meaning. The more embarrassing part was the introduction a student gave of me, describing me as some kind of OSS superstar. Thankfully that ended in under a minute and I was then allowed to talk to the students.

I did not have a lot of expectations from the audience because I had assumed that most attendees would have to be introduced to the concept of Open Source, but that wasn’t the case at all. Most of the students used Linux (mostly Ubuntu and a couple of Fedora) and had an idea of what Open Source was, but didn’t have the words to describe it. The only other thing they seemed to lack was the awareness that they could change code in the operating system they were running. I hope I was able to clear that for them. I had made slides for the talk but like always they were mostly useless and I just ended up talking to them directly. The questions were surprisingly insightful too, like licensing of code, standards, avenues for earning and so on. Overall I had a very fulfiling session and I hope it was the same for them too.

Comments

Fedora Activity Day at Pune

We had a Fedora Activity Day at the Red Hat office today in Pune. The FUDCon at the College of Engineering, Pune was the last major Fedora event that I was part of in Pune, so I was looking forward to the FAD to finally reboot my active involvement in Fedora.

Most of the organizers were not very familiar with arranging Fedora events. Some of us had participated in them and even helped during FUDCon, but actually planning everything on our own seemed quite difficult. We also did not want an event where people came, attended talks and went away, which is why a FAD seemed like the best option. To make sure that we didn’t end up just meeting and getting to know each other, we decided on a single theme, which is testing the upcoming Fedora 21.

Given that we had no clue what to expect, we didn’t ask for any sponsorship, just a room and internet from the Red Hat Pune office. The other difference was that we also invited people to participate remotely over IRC and we got a decent response on that front too.

I had decided to run the F21 installer through the grinder, but changed my mind the previous day and decided to test glibc. On the day, I changed my mind again and started testing the KDE Live ISO. People started trickling in a little after 9 and soon we had almost everyone who had signed up to come. There were a lot of lively discussions over bugs and everyone cross-checking with each other on bugs before filing them. Prasad did a little session on DNSSEC to get more people to test DNSSEC on F21.

Lunch was ordered and as it turned out, I don’t have a clue how hungry hackers get after a session of serious testing. We ended up under-ordering thanks to my estimation skills and some of us had to supplement our diet with cup noodles. That wasn’t enough of a damper for anyone though, as people ploughed on after lunch. I managed to file 4 bugs, all against anaconda. Kashyap did a short session on virtual machine shapshots and had quite a few people actively trying it out, while others tested ON_QA bugs to give karma.

Towards the end of the day, I downloaded gnulib trunk to run its tests against F21 glibc. I found a few additional failures, but I couldn’t work through it because I had to leave for home. I need to close that one some day, hopefully sooner than later. In the end, we had a very fruitful day of testing with over 8 components covered and about 15 bugs filed, not including some that were already filed. I’m already looking forward to having another hackfest or bugfest.

Comments

File offsets, active handles and stdio

Back in 2012 I wrote a patch to optimize ftell so that it doesn’t always flush buffers when it just has to report the current file offset. That patch unleashed a whole lot of pain due to the way in which the ftell code was written - it shared code with fseek, which is semantically different from ftell in a few ways. The general mess in libio code didn’t help matters much either.

Since that patch, a number of fixes went in to correct broken behaviour and it culminated in me essentially rewriting ftell. The main problems encountered were related to caching of the file offset in the underlying file to avoid a syscall, so Carlos suggested I write up a wiki document explaining the various scenarios. So I wrote File offsets in stdio stream and ftell in the glibc wiki.

Since this is a new ftell implementation, we’d love to get feedback on correctness (as bug reports) and performance (as bug reports, email, tweets, etc.).

Comments

Setting up patchwork on Dreamhost

We have been talking about having a patch review system in place for glibc for some time now since the volume of patches has been steadily increasing and we don’t have enough reviewers to go through them in time, leading to missed patches and general contributor unhappiness. Due to the fact that we’re a primarily email driven project, we needed something that fits into our current workflow and patchwork was the obvious choice to start with. I decided to do a setup on my domain first to get a feel of things before I made a request to set up a patchwork instance on sourceware. The instance is live now on patchwork.siddhesh.in for glibc contributors to get a feel of it.

The hard part about patchwork is that the documentation is a myth. There is an INSTALL file that sort of works, except that it doesn’t the moment you decide to use slightly different settings. Additionally, the instructions are targeted at dedicated hosting providers, so they almost completely don’t apply to someone trying to set up patchwork on Dreamhost on their shared hosting account. Of course, figuring out what to do with your patchwork installation once it is done is an adventure as well, since there seems to be no user documentation at all. Anyway, here’s how I did it:

Setting up the server and getting sources

I assume you have a shell account for the user that would be administering the subdomain, since you’d be doing a fair bit of sysadminy stuff on it. Also, it’s assumed that you’re hosted on a Linux based server; I don’t really care about how it works for Windows based hosting.

Create your Dreamhost subdomain using the control panel and make sure you have Passenger support enabled. The Passenger support creates a directory called public in your subdomain directory. Don’t bother setting up django at this stage.

Now get patchwork from their git repo:

$ git clone git://ozlabs.org/home/jk/git/patchwork

and copy the contents (i.e. whatever is inside patchwork, not the directory itself) into your subdomain directory. The contents of your subdomain directory would then be something like this:

$ ls
apps  docs  htdocs  lib  public  templates  tools

Now remove the public directory and create a symlink from htdocs.

$ ln -s htdocs public

We could technically just copy things over from htdocs to public, but it’s easier to update this way.

Next, we need django. patchwork needs django 1.5.x, so if your server doesn’t have it, you’ll need to download the sources yourself. To check the installed django version:

$ python
Python 2.6.6 (r266:84292, Dec 26 2010, 22:31:48) 
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import django
>>> django.VERSION
(1, 2, 3, 'final', 0)
>>> 

Since we don’t have 1.5.x, we install django from sources in the lib/packages in your subdomain directory:

git clone https://github.com/django/django.git -b stable/1.5.x

Create a directory lib/python in your subdomain directory and symlink the django installation in it:

$ ln -s ../packages/django/django ./django

Configuring django and patchwork sources

The first thing to configure is the database. From your dreamhost control panel, create a mysql database and user. Have that information handy to put in your django/patchwork configuration.

The default settings for patchwork (and django) are in apps/settings.py. We need to override those by creating our own file called apps/local_settings.py. The first thing to go in our local_settings.py is our database configuration:

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'db_name',
        'USER': 'db_username',
        'PASSWORD': 'super_secret_password',
        'HOST': 'mysql.myhost.name',
        'PORT': ''
    },
}

The instructions in the patchwork documentation mention using DATABASE_* style variables, but they didn’t work for me, probably because I figured out that I had an older version of django after I was done with the initial configuration.

Next you need set the following variables in local_settings.py:

SECRET_KEY = 'a random generated long string'

ADMINS = (
     ('Super Admin', 'super@foo.com'),
)

TIME_ZONE = 'Asia/Kolkata'
LANGUAGE_CODE = 'en-us'
DEFAULT_FROM_EMAIL = 'Patchwork (foo.com) '
NOTIFICATION_FROM_EMAIL = DEFAULT_FROM_EMAIL

# If you change the ROOT_DIR setting in your local_settings.py, you'll need to
# re-define the variables that use this (MEDIA_ROOT and TEMPLATE_DIRS) too.
ROOT_DIR = '/path/to/patchwork.foo.com'
TEMPLATE_DIRS = (
    # Put strings here, like "/home/html/django_templates" or "C:/www/django/templates".
    # Always use forward slashes, even on Windows.
    # Don't forget to use absolute paths, not relative paths.
    os.path.join(ROOT_DIR, 'templates')
)
# Absolute path to the directory that holds media.
# Example: "/home/media/media.lawrence.com/"
MEDIA_ROOT = os.path.join(
    ROOT_DIR, 'lib', 'python', 'django', 'contrib', 'admin', 'media')

The SECRET_KEY can be generated using the following python snippet:

import string, random
chars = string.letters + string.digits + string.punctuation
print repr("".join([random.choice(chars) for i in range(0,50)]))

Other options are obvious from their names and values, so adjust them to your taste. ROOT_DIR is set to the directory where patchwork is, i.e. your subdomain directory. TEMPLATE_DIRS and MEDIA_ROOT are derived from ROOT_DIR, so I’ve just copied those over from settings.py.

Next up, we need to get static files for admin sessions into a place where django can find and serve it. They’re present in contrib/admin/static in your django installation and we need to copy them over to htdocs/static. Once this is done, we need to tell django that it can find the static files by adding the following configuration snippet to local_settings.py:

PROJECT_ROOT = os.path.normpath(os.path.dirname(__file__))
STATIC_ROOT = os.path.join(PROJECT_ROOT, 'static')
STATIC_URL='/static/'
ADMIN_MEDIA_PREFIX='/static/admin/'

INSTALLED_APPS = (
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.sites',
    'django.contrib.admin',
    'django.contrib.staticfiles',
    'patchwork',
)

Finally, we don’t want to spew out debugging messages to the server and we want to be able to debug problems at the same time, so we need logging support to be enabled. Add the following snippet to local_settings.py:

LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
    'handlers': {
        'file': {
            'level': 'DEBUG',
            'class': 'logging.FileHandler',
            'filename': '/path/to/patchwork.foo.com/django-debug.log',
        },
    },
    'loggers': {
        'django.request': {
            'handlers': ['file'],
            'level': 'DEBUG',
            'propagate': True,
        },
    },
}

making sure that the path in the ‘filename’ is writable by django. We can now disable debugging, so add this:

DEBUG=False

Now here’s the fun part, disabling debugging changes the behaviour of django, in that it suddenly starts doing extra checks, due to which you’ll start seeing failures like below:

SuspiciousOperation: Invalid HTTP_HOST header (you may need to set ALLOWED_HOSTS)

so fix this by adding the following:

ALLOWED_HOSTS = ['your.subdomain']

These are all the settings you would normally need to get patchwork up. Now we run manage.py from the apps directory (note that the instructions in INSTALL are slightly wrong here):

$ PYTHONPATH=../lib/python ./manage.py syncdb

This should initialize the database for patchwork and django. Follow whatever prompts that come up till you come back to the shell. If the command throws errors, read up and fix your configuration. The django documentation is surprisingly good once you get used to the layout, so don’t despair if patchwork documentation doesn’t help (it won’t).

Getting patchwork up

With the database set up and the sources in place, one needs to tell apache how to serve content through django. We had set up passenger for precisely this, so we just need to add a python script in our subdomain directory to tell passenger what to do. The script should be named passenger_wsgi.py and it’s contents should be:

import sys, os
basedir = os.getcwd()
sys.path.insert(0, os.path.join(basedir, 'lib/python'))
sys.path.append(basedir)
sys.path.append(os.path.join(basedir, 'apps'))
os.environ['DJANGO_SETTINGS_MODULE'] = "apps.settings"
import django.core.handlers.wsgi
application = django.core.handlers.wsgi.WSGIHandler()

This tells passenger where to find the app and what paths to use to find our custom django and any additional libraries and also our settings. At this point, browsing to your site should get you to a working patchwork installation. Login to the admin page (which should be http://patchwork.foo.com/admin/) with the superuser name and password you had created while running manage.py.

The first thing you’ll need to do is create a Site. Make sure you delete the example.com there since it will otherwise be used first and you won’t like it. After the site change, if you start getting a server error, then look at your django log and look for this error:

DoesNotExist: Site matching query does not exist.

This likely means that there was some data that referred to example.com and broke. You’ll have to edit the row for your subdomain in the django_sites table and change the id to 1.

Next, you add a project of your choice. It took me a while to understand what the names of the fields in the new project form meant, but maybe that was because it was getting very late. Anyway, Linkname is any name you wish to give to the project, which appears in the URL – I used glibc here. Name is a free form name for the project – I used GNU C Library. Listid is the List-ID header in the mailing list email that it should look for. Listemail is the email address for the mailing list. I just ignored the rest of the fields.

Getting emails from the project

Patchwork is useless without emails, so subscribe to the mailing list whose patches you want to monitor. You obviously need an email address for it, so create an email address first. There is usually a confirmation step involved in mailing list subscription, which should also be completed.

Next step is to forward emails from this account to the shell account where the site is running. To do this, dreamhost has a wiki document that describes how to link an email address to a shell account email. Follow that and send test emails to make sure that emails are reaching the shell user.

Now we need a mechanism to forward the mails received by the shell user to the parsing script in patchwork. This is easy to set with a forwarder file for postfix, called .forward.postfix in your home directory with the following contents:

"|/path/to/subdomain/apps/patchwork/bin/parsemail.sh"

Now you should start seeing patches in your project view on the website.

Sending emails from patchwork

This bit is fairly straightforward – patchwork needs to be able to send emails via SMTP for various events or for registration confirmation. Django needs the following information in local_settings.py to be able to send these emails, or else it uses the local SMTP, which won’t work on Dreamhost:

EMAIL_HOST = 'mail.patchwork.foo.com'
EMAIL_HOST_USER = 'from@patchwork.foo.com'
EMAIL_HOST_PASSWORD = 'supersecret'
EMAIL_USE_TLS = True

Getting started with patch reviews

When browsing, you’ll quickly figure out that you can’t really do anything with the patches you see there, unless you have sent a patch, i.e. your email address configured in patchwork is the same as the email address of the patch sender. This is not really very useful as far as peer reviews are concerned and working through the UI doesn’t tell you anything. One would obviously gravitate towards the admin UI to see if there are any settings there in the Users or Groups or Projects or People sections, but there are none.

The option is effectively hidden in plain sight in the User Profiles section. Click through to your user (or any user you're interested in updating) and come to the page that shows the projects that you're part of (Primary Projects) and projects that you're maintainer of (Maintainer Projects). The list of projects that show up next to Maintainer Projects aren't projects you're maintainer of, unless they're selected! So select the project(s) you are maintainer of and save your changes. Now when you login as that user, you'll see options to change patch review state and even delegate to other reviewers.

Final thoughts

It was quite an exhausting experience to get patchwork working and it wasn't just because it was on dreamhost. The project is fairly poorly documented and the usual clumsiness associated with web apps didn't help things. My relative inexperience with django may have compounded the problems I had getting this up, but I would again blame that on clumsy webapp syndrome.

I have written this down with the hope that someone else looking to do this would be able to get patchwork up in a bit less time than I took, and also because we may have to do it again on sourceware when there is consensus on using it. Given my wonderful memory, I'll probably end up making all the mistakes once again when I try it out the next time.

Comments