Operationalizing Research Software for Supply Chain Security
Kelechi G. Kalu, Soham Rattan, Taylor R. Schorlemmer, George K. Thiruvathukal, Jeffrey C. Carver, James C. Davis

TL;DR
This paper introduces a taxonomy for research software in the supply chain security context, enabling consistent empirical analysis and security assessment across diverse studies.
Contribution
It develops a harmonized taxonomy and a reproducible labeling pipeline for research software, facilitating standardized security analysis and comparison.
Findings
Repository-centric security signals vary across taxonomy-defined clusters.
Taxonomy-aware stratification improves interpretation of security measurements.
The approach enables consistent empirical research in research software security.
Abstract
Empirical studies of research software are hard to compare because the literature operationalizes ``research software'' inconsistently. Motivated by the research software supply chain (RSSC) and its security risks, we introduce an RSSC-oriented taxonomy that makes scope and operational boundaries explicit for empirical research software security studies. We conduct a targeted scoping review of recent repository mining and dataset construction studies, extracting each work's definition, inclusion criteria, unit of analysis, and identification heuristics. We synthesize these into a harmonized taxonomy and a mapping that translates prior approaches into shared taxonomy dimensions. We operationalize the taxonomy on a large community-curated corpus from the Research Software Encyclopedia (RSE), producing an annotated dataset, a labeling codebook, and a reproducible labeling pipeline.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
