Archive for the ‘virtualization’ Category
Some KVM development community stats
Today I made a presentation (pdf) on the Linux Kernel Virtual Machine to the Red Hat Cloud Computing Forum. I enjoyed the format. All the presentations were short (30 minutes including Q&A) and technical. This is the type of forum I enjoy attending, so its easy to prepare and I am comfortable with the audience.
KVM Developer Participation
One of the topics I covered in the presentation is the level of KVM development activity in 2009. To measure the depth and breadth of participation in KVM development, I used activity on the developer mailing lists for the three primary components of the KVM Hypervisors: KVM, which provides the virtual machine monitor; Qemu, which provides the virtual machine environment; and libvirt, which provides the low-level management interfaces.
I like to think that monitoring the traffic on an open-source project’s mailing list is a lot like gathering intelligence through traffic analysis. You can learn who is working on a project, what specific areas they are working in, and with whom they are working. The volume of traffic is also a good indicator of the weight behind a project and the overall development velocity. If you were really ambitious you could graph the relationships among various projects based on the participation of specific individuals.
Below is a summary of the raw statistics for 2009:
What can we learn from Raw Message Counts?
These three mailings lists are dedicated to development activity. There are three types of messages included in the analysis:
- Source code. All source code changes are submitted as email messages. Per conventions for Linux kernel development, the subject line of these messages usually includes the tag “[PATCH].”
- Source code review. When a developer submits some proposed changes, analysis and discussion of the source code generates replies to the original email. Developers use email clients that support message threading, which makes it easier to follow the discussion.
- Bug reports.
All three of the activities are part of what we typically consider the job description of a software engineer. There is another sub-category of messages that intermix design proposals with source code submissions. These messages usually include the tag “[RFC]” in the subject line.
Because of this, I believe that message counts for mailing lists dedicated to software development provide a good indication of health of a development community. For KVM, the statistics are impressive.
- Almost 400 organizations participated in KVM development, ranging from large corporations such as IBM, Intel, and Red Hat, to academic institutions and individual contributors.
- Approximately 800 unique contributors. This is an extremely broad group of software developers.
- A solid core of “super contributors,” developers who form the top tier of the project contributions.
Top Individual Contributors
It’s also good to look at the top individual contributors. These are the folks who are generally 100% focused on the project and are the most prolific programmers.
KVM-Devel
3810 avi@redhat.com 1261 mst@redhat.com 851 gleb@redhat.com 799 mtosatti@redhat.com 507 ghaskins@novell.com 453 anthony@codemonkey.ws 410 lmr@redhat.com 394 agraf@suse.de 362 sheng@linux.intel.com 357 jan.kiszka@siemens.com 356 glommer@redhat.com 336 mgoldish@redhat.com 226 amit.shah@redhat.com 223 jan.kiszka@web.de 209 markmc@redhat.com 208 alex.williamson@hp.com 197 joerg.roedel@amd.com 178 mhiramat@redhat.com
Qemu
1839 anthony@codemonkey.ws 1457 kraxel@redhat.com 1447 quintela@redhat.com 961 avi@redhat.com 819 aurelien@aurel32.net 805 lcapitulino@redhat.com 805 blauwirbel@gmail.com 745 mst@redhat.com 617 yamahata@valinux.co.jp 565 agraf@suse.de 558 aliguori@us.ibm.com 540 jan.kiszka@siemens.com 493 markmc@redhat.com 468 paul@codesourcery.com 425 gleb@redhat.com 418 jamie@shareable.org 407 av1474@comtv.ru 402 armbru@redhat.com 391 glommer@redhat.com 371 kwolf@redhat.com
Comments on Individual Developer Counts
Looking at the individual contributor counts for KVM and Qemu, it is clear that the top contributor on each list is quite a bit more active than next highest contributor. (On the Qemu list, anthony@codemonkey.ws and aliguori@us.ibm.com are the same person posting under two different addresses, which is not uncommon. This means anthony’s actual message count is clost to 2500 messages). Avi Kivity is the KVM maintainer, and Anthony Liguori is the Qemu maintainer. It’s the job of the maintainer to review and accept all code submissions, and to package and announce new releases of the code. So you expect the maintainer of a project to post the most messages.
You can also see from these message counts that there is a large overlap of top contributors to the KVM and Qemu projects. In fact Avi Kivity is a top contributor to Qemu, and Anthony Liguori is a top contributor to KVM.
See for Yourself
You can read the kvm-devel and Qemu mailing lists via the web using Gmane.
Source code for Analysis Tools
I wrote a couple of crude utilities to do this maling list analysis. The are:
string_search.c a utility that understands email address strings and can process them and count instances of specific addresses in a file
mbox-filter.py a python utility that filters an mbox-formatted email file in a number of different ways. I use it, for example, to collect all messages that fall within a certain range of dates.
Perhaps in a future post I’ll document these utilities and enhance them. They are very crude at this point. Once I got the info I needed out of the mbox files I stopped working on them.
