Biomedical Open Source Software: Crucial Packages and Hidden Heroes
Eva Maxfield Brown, Stephan Druskat, Laurent H\'ebert-Dufresne, James Howison, Daniel Mietchen, Andrew Nesbitt, Jo\~ao Felipe Pimentel, Boris Veytsman

TL;DR
This paper maps the dependency networks of biomedical research software, identifying crucial packages and analyzing their roles across major ecosystems using centrality metrics.
Contribution
It introduces a method to analyze the dependency networks of biomedical software and identifies key packages in major ecosystems.
Findings
Identified critical biomedical software packages using centrality metrics.
Mapped upstream dependencies of software in biomedical research papers.
Analyzed three major ecosystems: PyPi, CRAN, and Bioconductor.
Abstract
Despite the importance of scientific software for research, it is often not formally recognized and rewarded. This is especially true for foundational libraries, which are hidden below packages visible to the users (and thus doubly hidden, since even the packages directly used in research are frequently not visible in the paper). Research stakeholders like funders, infrastructure providers, and other organizations need to understand the complex network of computer programs that contemporary research relies upon. In this work, we use the CZ Software Mentions Dataset to map the upstream dependencies of software used in biomedical papers and find the packages critical to scientific software ecosystems. We propose centrality metrics for the network of software dependencies, analyze three ecosystems (PyPi, CRAN, Bioconductor), and determine the packages with the highest centrality.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
