Software Libraries and Their Reuse: Entropy, Kolmogorov Complexity, and Zipf's Law
Todd L. Veldhuizen

TL;DR
This paper models software reuse using information theory and Kolmogorov complexity, showing that the potential for reuse is primarily determined by the intrinsic diversity of the problem domain, not just tools or culture.
Contribution
It introduces an entropy-based measure of domain diversity that bounds software reuse potential and provides a theoretical framework for understanding reuse limitations.
Findings
Low entropy domains enable high reuse and large components.
High entropy domains require mostly new code with limited reuse.
Empirical results from Unix platforms support the model's predictions.
Abstract
We analyze software reuse from the perspective of information theory and Kolmogorov complexity, assessing our ability to ``compress'' programs by expressing them in terms of software components reused from libraries. A common theme in the software reuse literature is that if we can only get the right environment in place-- the right tools, the right generalizations, economic incentives, a ``culture of reuse'' -- then reuse of software will soar, with consequent improvements in productivity and software quality. The analysis developed in this paper paints a different picture: the extent to which software reuse can occur is an intrinsic property of a problem domain, and better tools and culture can have only marginal impact on reuse rates if the domain is inherently resistant to reuse. We define an entropy parameter of problem domains that measures program diversity, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpreadsheets and End-User Computing · Software Engineering Research · Open Source Software Innovations
