Free Newsletters

   All InfoWorld Newsletters
Open Sources | Rodrigues & Urlocker » Asay: More on how open source actually operates

September 28, 2005 | Comments: (0)

Asay: More on how open source actually operates

Audris Mockus (Avaya Labs), Roy Fielding (eBuilt), and James Herbsleb (Bell Laboratories) have posted interesting research on how open source development communities work, using Apache and Mozilla to frame hypotheses of successful open source projects. You can find the paper - "Two Case Studies of Open Source Software Development: Apache and Mozilla" - here. It's worth a read, especially if you're a commercial entity (or a VC investing in such) looking to build an open source community around your company's project.

The authors of the report list a few hypotheses developed from their analysis of Apache, and then refine them in light of their Mozilla analysis (and, unless I misread, they also tested them against a few other projects). The results are interesting:

Hypothesis 1a: Open source developments will have a core of developers who control the code base, and will create approximately 80% or more of the new functionality. If this core group uses only informal, ad hoc means of coordinating their work, it will be no larger than 10-15 people.

Comment: See the chart at right. As the researchers note,

The chart "shows that the top 15 developers contributed more than 83% of the MRs [managed releases] and deltas [an MR generates one delta for Apache - Core developers do 83 percent of workeach of the files it changes], 88% of added lines and 91% of deleted lines. Very little code and, presumably, correspondingly small effort is spent by non-core developers (for simplicity, in this section we refer to all the developers outside the top 15 group as non-core). The MRs done by core developers are substantially larger than those done by the non-core group." (14)

See more at AC/OS.

Hypothesis 2a: If a project is so large that more than 10-15 people are required to complete 80% of the code in the desired time frame, then other mechanisms, rather than just informal, ad hoc arrangements, will be required in order to coordinate the work. These mechanisms may include one or more of the following: explicit development processes, individual or group code ownership, and required inspections.

Hypothesis 3: In successful open source developments, a group larger by an order of magnitude than the core will repair defects, and a yet larger group (by another order of magnitude) will report problems.

As the authors report,

"[The chart at right] shows that participation of wider development community is more significant in defect repair than in the development of new functionality. Apache - Spread of Bug FixesThe top 15 contributors produced only 66% of the fixes. The participation rate was 26 developers per 100 fixes and 4 developers per 100 code submissions, i.e., more than six times lower for fixes. These results indicate that despite broad overall participation in the project, almost all new functionality is implemented and maintained by the core group." (15)
The authors further found that "of the top 15 problem reporters only three are also core developers [on Apache]. It shows that the significant role of system tester is reserved almost exclusively to the wide community of Apache users." (18) This is interesting, because it points to the importance of developing a wide user base (which, as I indicated in my research posting yesterday, generally follows from the core team developing a significant body of code that others can use. Seems logical, but logic doesn't apparently follow the vast majority of projects on Sourceforge.net. It points to the need to free up core developers' time for writing new functionality - get a large body of beta testers, as MySQL, JBoss, and others have done.

Hypothesis 4: Open source developments that have a strong core of developers but never achieve large numbers of contributors beyond that core will be able to create new functionality but will fail because of a lack of resources devoted to finding and repairing defects.

Hypothesis 6: In successful open source developments, the developers will also be users of the software.

As the authors note,

"The reasoning behind this hypothesis was that low defect densities are achieved because developers are users of the software, hence they have considerable domain expertise. This puts them at a substantial advantage relative to many commercial developers who vary greatly in their domain expertise. This certainly appears to be true in the Mozilla case. While we did not have data on Mozilla use by Mozilla developers, it is wildly implausible to suggest that the developers were not experienced browser users, hence, "domain experts" in the sense of this hypothesis." (37-38)

The researchers make points about open source code quality, defect resolution response time, etc., but these are of secondary importance to me, so you'll have to read the full paper to see what they say.

Some parting thoughts:

  • As the researchers found in Mozilla's case, good documentation, tutorials, and refined development tools and processes can help grow a community. It's tough for people to contribute if they haven't a clue as to where to begin....
  • In larger projects (like Mozilla, unlike Apache - Apache's core code is kept very lean, with all new functionality farmed out to separate modules/projects), modularity is critical. Unfortunately, for projects that start out commercial and then try to go open source, the code base often is riddled with interdependencies (as was the case with Mozilla). Mozilla found a way around this by establishing module-by-module code ownership, with that one individual sufficiently knowledgeable about her module to ensure code conflicts don't arise within that module. These individual owners, then, must carefully coordinate with other module owners, unlike in a small core project like Apache, where the dozen core contributors are well aware of what's going on across the core, enabling them to contribute to various pieces of code within the core.
  • As in commercial software, there is no free lunch. If you want people to use your code, you have to spend the time and effort to build something worth downloading, using, and commenting on. It's no easier than commercial software, and requires an equivalent amount of work. The payoff, however, is a user base that feels ownership in the project, and not merely a buyer. That's valuable.

Posted by Matt Asay on September 28, 2005 11:08 AM


RATE THIS ARTICLE:





 

  •  
  • COMMENTS




I'm a non technical person trying to understand how to choose software with others who are committed to finding a commercial solution. I believe that open source ERP (an end to end business system) is reaching the point that it can be considered for our business. Compiere, a former commercial product is tightly controlled and all code seems to be written in house. OFBiz seems to have the right organization, but may not have the necessary participants. This is a difficult decision as we look at many commercial products versus open source ones that are not yet fully featured. Should I try to convince others to consider waiting for products that may have the required features in somewhere between a year and never?

Posted by: Bill Parks at October 26, 2005 03:13 PM

Microsoft Mini Spotlight
  • Get Started
  • Port 25 Blogs
  • OSS News
  • Join a Project

{Open Source} Heroes Happen Here

Start today and order your own Hero Hack Pack – which includes Getting Started with Open Source, Windows Server 2008 and Visual Studio 2008 Trial. Each pack is a chance to win a free pass to OSCON 2008.







Technology White Papers

 

InfoWorld Technology Marketplace

» Technology White Papers Library

Technology White Papers by Topic

Technology White Papers E-mail Alert

Find out when the latest white paper is available:
 
 
» BUY A LINK NOW

Sponsored Technology Links