Multi-Artifact Analysis of Self-Admitted Technical Debt in Scientific Software
Eric L. Melin, Nasir U. Eisty, Gregory Watson, Addi Malviya-Thakur

TL;DR
This paper introduces a multi-artifact analysis approach to identify and categorize scientific debt, a domain-specific form of self-admitted technical debt in scientific software, highlighting its prevalence and the need for specialized detection methods.
Contribution
It develops a multi-source classifier for scientific debt, creates a curated dataset, and validates the practical relevance of scientific debt in scientific software projects.
Findings
Classifier performs well across 900,358 artifacts
Scientific debt is most prevalent in pull requests and issue trackers
Traditional SATD models often miss scientific debt
Abstract
Context: Self-admitted technical debt (SATD) occurs when developers acknowledge shortcuts in code. In scientific software (SSW), such debt poses unique risks to the validity and reproducibility of results. Objective: This study aims to identify, categorize, and evaluate scientific debt, a specialized form of SATD in SSW, and assess the extent to which traditional SATD categories capture these domain-specific issues. Method: We conduct a multi-artifact analysis across code comments, commit messages, pull requests, and issue trackers from 23 open-source SSW projects. We construct and validate a curated dataset of scientific debt, develop a multi-source SATD classifier, and conduct a practitioner validation to assess the practical relevance of scientific debt. Results: Our classifier performs strongly across 900,358 artifacts from 23 SSW projects. SATD is most prevalent in pull requests…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Scientific Computing and Data Management · Open Source Software Innovations
