Fast and Accurate Heuristics for Bus-Factor Estimation
Sebastiano Antonio Piccolo

TL;DR
This paper introduces two novel heuristics for estimating the bus-factor in large software projects, providing more accurate and scalable results than existing methods, and demonstrates their effectiveness through extensive empirical evaluation.
Contribution
The paper proposes two new approximation heuristics for bus-factor estimation based on graph peeling, improving accuracy and scalability over traditional degree-based heuristics.
Findings
Heuristics outperform degree-based methods in accuracy.
Methods scale to graphs with millions of nodes in minutes.
Heuristics are robust to structural variations.
Abstract
The bus-factor is a critical risk indicator that quantifies how many key contributors a project can afford to lose before core knowledge or functionality is compromised. Despite its practical importance, accurately computing the bus-factor is NP-Hard under established formalizations, making scalable analysis infeasible for large software systems. In this paper, we model software projects as bipartite graphs of developers and tasks and propose two novel approximation heuristics, Minimum Coverage and Maximum Coverage, based on iterative graph peeling, for two influential bus-factor formalizations. Our methods significantly outperform the widely adopted degree-based heuristic, which we show can yield severely inflated estimates. We conduct a comprehensive empirical evaluation on over synthetic power-law graphs and demonstrate that our heuristics provide tighter estimates while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
