Assessing the Early Bird Heuristic (for Predicting Project Quality)
N.C. Shrikanth, Tim Menzies

TL;DR
This paper demonstrates that early project data, specifically the first 150 commits, can effectively predict project quality, often outperforming complex models and enabling rapid, early-stage decision-making.
Contribution
It introduces the 'Early Bird' heuristic, showing that simple models trained on initial project data can match or surpass more complex approaches in quality prediction.
Findings
Early project data contains most predictive information.
Simple models using early data perform as well or better than complex models.
Early data-based models generalize across hundreds of projects.
Abstract
Before researchers rush to reason across all available data or try complex methods, perhaps it is prudent to first check for simpler alternatives. Specifically, if the historical data has the most information in some small region, perhaps a model learned from that region would suffice for the rest of the project. To support this claim, we offer a case study with 240 projects, where we find that the information in those projects "clump" towards the earliest parts of the project. A quality prediction model learned from just the first 150 commits works as well, or better than state-of-the-art alternatives. Using just this "early bird" data, we can build models very quickly and very early in the project life cycle. Moreover, using this early bird method, we have shown that a simple model (with just a few features) generalizes to hundreds of projects. Based on this experience, we doubt…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Software Reliability and Analysis Research
