An Empirical Analysis of the R Package Ecosystem
Ethan Bommarito, Michael J Bommarito II

TL;DR
This paper provides a comprehensive empirical analysis of the R package ecosystem over two decades, revealing growth trends, distribution patterns, and licensing practices across CRAN, Bioconductor, and GitHub.
Contribution
It offers the first large-scale longitudinal empirical summary of the entire R package ecosystem, including detailed metrics and growth analysis.
Findings
Ecosystem growth has been robust with annual growth rates around 26-29%.
Distribution of packages and maintainers is highly right-skewed, with a few packages and maintainers supporting most dependencies.
Majority of packages are under copyleft licenses or lack licensing information.
Abstract
In this research, we present a comprehensive, longitudinal empirical summary of the R package ecosystem, including not just CRAN, but also Bioconductor and GitHub. We analyze more than 25,000 packages, 150,000 releases, and 15 million files across two decades, providing comprehensive counts and trends for common metrics across packages, releases, authors, licenses, and other important metadata. We find that the historical growth of the ecosystem has been robust under all measures, with a compound annual growth rate of 29% for active packages, 28% for new releases, and 26% for active maintainers. As with many similar social systems, we find a number of highly right-skewed distributions with practical implications, including the distribution of releases per package, packages and releases per author or maintainer, package and maintainer dependency in-degree, and size per package and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Research Data Management Practices · Advanced Data Storage Technologies
