Big Data = Big Insights? Operationalising Brooks' Law in a Massive GitHub Data Set
Christoph Gote, Pavlin Mavrodiev, Frank Schweitzer, Ingo Scholtes

TL;DR
This paper investigates how team size influences developer productivity in large GitHub datasets, addressing conflicting findings and emphasizing the importance of metric choice and data analysis pitfalls.
Contribution
It provides the largest curated GitHub dataset for studying team size effects and discusses challenges and pitfalls in big data analysis of software productivity.
Findings
Larger datasets do not necessarily yield more reliable insights.
Discrepancies in productivity findings are linked to metric and methodology choices.
The study offers a curated dataset for future research on team size and productivity.
Abstract
Massive data from software repositories and collaboration tools are widely used to study social aspects in software development. One question that several recent works have addressed is how a software project's size and structure influence team productivity, a question famously considered in Brooks' law. Recent studies using massive repository data suggest that developers in larger teams tend to be less productive than smaller teams. Despite using similar methods and data, other studies argue for a positive linear or even super-linear relationship between team size and productivity, thus contesting the view of software economics that software projects are diseconomies of scale. In our work, we study challenges that can explain the disagreement between recent studies of developer productivity in massive repository data. We further provide, to the best of our knowledge, the largest,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOpen Source Software Innovations · Software Engineering Research · Online Learning and Analytics
