Archive for the ‘virtualization’ tag
Some KVM development community stats
Today I made a presentation (pdf) on the Linux Kernel Virtual Machine to the Red Hat Cloud Computing Forum. I enjoyed the format. All the presentations were short (30 minutes including Q&A) and technical. This is the type of forum I enjoy attending, so its easy to prepare and I am comfortable with the audience.
KVM Developer Participation
One of the topics I covered in the presentation is the level of KVM development activity in 2009. To measure the depth and breadth of participation in KVM development, I used activity on the developer mailing lists for the three primary components of the KVM Hypervisors: KVM, which provides the virtual machine monitor; Qemu, which provides the virtual machine environment; and libvirt, which provides the low-level management interfaces.
I like to think that monitoring the traffic on an open-source project’s mailing list is a lot like gathering intelligence through traffic analysis. You can learn who is working on a project, what specific areas they are working in, and with whom they are working. The volume of traffic is also a good indicator of the weight behind a project and the overall development velocity. If you were really ambitious you could graph the relationships among various projects based on the participation of specific individuals.
Below is a summary of the raw statistics for 2009:
What can we learn from Raw Message Counts?
These three mailings lists are dedicated to development activity. There are three types of messages included in the analysis:
- Source code. All source code changes are submitted as email messages. Per conventions for Linux kernel development, the subject line of these messages usually includes the tag “[PATCH].”
- Source code review. When a developer submits some proposed changes, analysis and discussion of the source code generates replies to the original email. Developers use email clients that support message threading, which makes it easier to follow the discussion.
- Bug reports.
All three of the activities are part of what we typically consider the job description of a software engineer. There is another sub-category of messages that intermix design proposals with source code submissions. These messages usually include the tag “[RFC]” in the subject line.
Because of this, I believe that message counts for mailing lists dedicated to software development provide a good indication of health of a development community. For KVM, the statistics are impressive.
- Almost 400 organizations participated in KVM development, ranging from large corporations such as IBM, Intel, and Red Hat, to academic institutions and individual contributors.
- Approximately 800 unique contributors. This is an extremely broad group of software developers.
- A solid core of “super contributors,” developers who form the top tier of the project contributions.
Top Individual Contributors
It’s also good to look at the top individual contributors. These are the folks who are generally 100% focused on the project and are the most prolific programmers.
KVM-Devel
3810 avi redhat 1261 mst redhat 851 gleb redhat 799 mtosatti redhat 507 ghaskins novell 453 anthony codemonkey 410 lmr redhat 394 agraf suse 362 sheng linux intel 357 jan.kiszka siemens 356 glommer redhat 336 mgoldish redhat 226 amit.shah redhat 223 jan.kiszka web de 209 markmc redhat 208 alex.williamson hp 197 joerg.roedel amd 178 mhiramat redhat
Qemu
1839 anthony codemonkey 1457 kraxel redhat 1447 quintela redhat 961 avi redhat 819 aurelien aurel32 805 lcapitulino redhat 805 blauwirbel gmail 745 mst redhat 617 yamahata valinux 565 agraf suse 558 aliguori ibm 540 jan.kiszka siemens 493 markmc redhat 468 paul codesourcery 425 gleb redhat 418 jamie shareable 407 av1474 comtv 402 armbru redhat 391 glommer redhat 371 kwolf redhat
Comments on Individual Developer Counts
Looking at the individual contributor counts for KVM and Qemu, it is clear that the top contributor on each list is quite a bit more active than next highest contributor. (On the Qemu list, anthony codemonkey and aliguori ibm are the same person posting under two different addresses, which is not uncommon. This means anthony’s actual message count is clost to 2500 messages). Avi Kivity is the KVM maintainer, and Anthony Liguori is the Qemu maintainer. It’s the job of the maintainer to review and accept all code submissions, and to package and announce new releases of the code. So you expect the maintainer of a project to post the most messages.
You can also see from these message counts that there is a large overlap of top contributors to the KVM and Qemu projects. In fact Avi Kivity is a top contributor to Qemu, and Anthony Liguori is a top contributor to KVM.
See for Yourself
You can read the kvm-devel and Qemu mailing lists via the web using Gmane.
Source code for Analysis Tools
I wrote a couple of crude utilities to do this maling list analysis. The are:
string_search.c a utility that understands email address strings and can process them and count instances of specific addresses in a file
mbox-filter.py a python utility that filters an mbox-formatted email file in a number of different ways. I use it, for example, to collect all messages that fall within a certain range of dates.
Perhaps in a future post I’ll document these utilities and enhance them. They are very crude at this point. Once I got the info I needed out of the mbox files I stopped working on them.
