Life in Code

Thoughts on technology from a veteran programmer.

Some KVM development community stats

post a comment

Today I made a presentation (pdf) on the Linux Kernel Virtual Machine to the Red Hat Cloud Computing Forum. I enjoyed the format. All the presentations were short (30 minutes including Q&A) and technical. This is the type of forum I enjoy attending, so its easy to prepare and I am comfortable with the audience.

KVM Developer Participation

One of the topics I covered in the presentation is the level of KVM development activity in 2009. To measure the depth and breadth of participation in KVM development, I used activity on the developer mailing lists for the three primary components of the KVM Hypervisors: KVM, which provides the virtual machine monitor; Qemu, which provides the virtual machine environment; and libvirt, which provides the low-level management interfaces.

I like to think that monitoring the traffic on an open-source project’s mailing list is a lot like gathering intelligence through traffic analysis. You can learn who is working on a project, what specific areas they are working in, and with whom they are working. The volume of traffic is also a good indicator of the weight behind a project and the overall development velocity. If you were really ambitious you could graph the relationships among various projects based on the participation of specific individuals.

Below is a summary of the raw statistics for 2009:

kvm development community statistics 2009

kvm development community statistics 2009

What can we learn from Raw Message Counts?

These three mailings lists are dedicated to development activity. There are three types of messages included in the analysis:

  1. Source code. All source code changes are submitted as email messages. Per conventions for Linux kernel development, the subject line of these messages usually includes the tag “[PATCH].”
  2. Source code review. When a developer submits some proposed changes, analysis and discussion of the source code generates replies to the original email. Developers use email clients that support message threading, which makes it easier to follow the discussion.
  3. Bug reports.

All three of the activities are part of what we typically consider the job description of a software engineer. There is another sub-category of messages that intermix design proposals with source code submissions. These messages usually include the tag “[RFC]” in the subject line.

Because of this, I believe that message counts for mailing lists dedicated to software development provide a good indication of health of a development community. For KVM, the statistics are impressive.

  • Almost 400 organizations participated in KVM development, ranging from large corporations such as IBM, Intel, and Red Hat, to academic institutions and individual contributors.
  • Approximately 800 unique contributors. This is an extremely broad group of software developers.
  • A solid core of “super contributors,”  developers who form the top tier of the project contributions.

Top Individual Contributors

It’s also good to look at the top individual contributors. These are the folks who are generally 100% focused on the project and are the most prolific programmers.

KVM-Devel

3810    avi  redhat
1261    mst  redhat
851     gleb redhat
799     mtosatti redhat
507     ghaskins novell
453     anthony codemonkey
410     lmr redhat
394     agraf suse
362     sheng linux intel
357     jan.kiszka siemens
356     glommer redhat
336     mgoldish redhat
226     amit.shah redhat
223     jan.kiszka web de
209     markmc redhat
208     alex.williamson hp
197     joerg.roedel amd
178     mhiramat redhat

Qemu

1839    anthony codemonkey
1457    kraxel redhat
1447    quintela redhat
961     avi redhat
819     aurelien aurel32
805     lcapitulino redhat
805     blauwirbel gmail
745     mst redhat
617     yamahata valinux
565     agraf suse
558     aliguori ibm
540     jan.kiszka siemens
493     markmc redhat
468     paul codesourcery
425     gleb redhat
418     jamie shareable
407     av1474 comtv
402     armbru redhat
391     glommer redhat
371     kwolf redhat

Comments on Individual Developer Counts

Looking at the individual contributor counts for KVM and Qemu, it is clear that the top contributor on each list is quite a bit more active than next highest contributor. (On the Qemu list, anthony codemonkey and aliguori ibm are the same person posting under two different addresses, which is not uncommon. This means anthony’s actual message count is clost to 2500 messages). Avi Kivity is the KVM maintainer, and Anthony Liguori is the Qemu maintainer. It’s the job of the maintainer to review and accept all code submissions, and to package and announce new releases of the code. So you expect the maintainer of a project to post the most messages.

You can also see from these message counts that there is a large overlap of top contributors to the KVM and Qemu projects. In fact Avi Kivity is a top contributor to Qemu, and Anthony Liguori is a top contributor to KVM.

See for Yourself

You can read the kvm-devel and Qemu mailing lists via the web using Gmane.

kvm-devel

qemu

Source code for Analysis Tools

I wrote a couple of crude utilities to do this maling list analysis. The are:

string_search.c a utility that understands email address strings and can process them and count instances of specific addresses in a file

mbox-filter.py a python utility that filters an mbox-formatted email file in a number of different ways. I use it, for example, to collect all messages that fall within a certain range of dates.

Perhaps in a future post I’ll document these utilities and enhance them. They are very crude at this point. Once I got the info I needed out of the mbox files I stopped working on them.

Share

Written by mdday

February 10th, 2010 at 4:28 pm

Leave a Reply