Software Libraries and Their Reuse: Entropy, Kolmogorov Complexity, and   Zipf's Law

Todd L. Veldhuizen

arXiv:cs/0508023·cs.SE·August 31, 2016·21 cites

Software Libraries and Their Reuse: Entropy, Kolmogorov Complexity, and Zipf's Law

Todd L. Veldhuizen

PDF

Open Access

TL;DR

This paper models software reuse using information theory and Kolmogorov complexity, showing that the potential for reuse is primarily determined by the intrinsic diversity of the problem domain, not just tools or culture.

Contribution

It introduces an entropy-based measure of domain diversity that bounds software reuse potential and provides a theoretical framework for understanding reuse limitations.

Findings

01

Low entropy domains enable high reuse and large components.

02

High entropy domains require mostly new code with limited reuse.

03

Empirical results from Unix platforms support the model's predictions.

Abstract

We analyze software reuse from the perspective of information theory and Kolmogorov complexity, assessing our ability to ``compress'' programs by expressing them in terms of software components reused from libraries. A common theme in the software reuse literature is that if we can only get the right environment in place-- the right tools, the right generalizations, economic incentives, a ``culture of reuse'' -- then reuse of software will soar, with consequent improvements in productivity and software quality. The analysis developed in this paper paints a different picture: the extent to which software reuse can occur is an intrinsic property of a problem domain, and better tools and culture can have only marginal impact on reuse rates if the domain is inherently resistant to reuse. We define an entropy parameter $H \in [0, 1]$ of problem domains that measures program diversity, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpreadsheets and End-User Computing · Software Engineering Research · Open Source Software Innovations