A Model of the Commit Size Distribution of Open Source
Carsten Kolassa, Dirk Riehle, Michel A. Salim

TL;DR
This paper develops a statistical model for the distribution of commit sizes in open source projects, validated across various project sizes, enhancing understanding and tools for software development.
Contribution
It introduces a probabilistic model of commit size distribution applicable to diverse open source projects, validated through graphical and statistical methods.
Findings
The model accurately fits commit size data across projects.
Commit size distribution follows a specific probabilistic pattern.
Model validation confirms applicability to different project sizes.
Abstract
A fundamental unit of work in programming is the code contribution ("commit") that a developer makes to the code base of the project in work. We use statistical methods to derive a model of the probabilistic distribution of commit sizes in open source projects and we show that the model is applicable to different project sizes. We use both graphical as well as statistical methods to validate the goodness of fit of our model. By measuring and modeling a fundamental dimension of programming we help improve software development tools and our understanding of software development.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
