Exploring Scientific Debt: Harnessing AI for SATD Identification in Scientific Software
Eric L. Melin, Ahmed Musa Awon, Nasir U. Eisty, Neil A. Ernst, Shurui Zhou

TL;DR
This paper investigates Self-Admitted Technical Debt (SATD) in scientific software, revealing its higher prevalence compared to general software and evaluating transformer models for effective SATD identification to improve scientific reliability.
Contribution
It provides the first comparative analysis of SATD in scientific versus general software and evaluates transformer-based models for SATD detection in scientific code comments.
Findings
SSW has 9.25x more SATD than general software.
Transformer models outperform existing SATD detection methods.
Effective SATD identification can enhance scientific software quality.
Abstract
Developers often leave behind clues in their code, admitting where it falls short, known as Self-Admitted Technical Debt (SATD). In the world of Scientific Software (SSW), where innovation moves fast and collaboration is key, such debt is not just common but deeply impactful. As research relies on accurate and reproducible results, accumulating SATD can threaten the very foundations of scientific discovery. Yet, despite its significance, the relationship between SATD and SSW remains largely unexplored, leaving a crucial gap in understanding how to manage SATD in this critical domain. This study explores SATD in SSW repositories, comparing SATD in scientific versus general-purpose open-source software and evaluating transformer-based models for SATD identification. We analyzed SATD in 27 scientific and general-purpose repositories across multiple domains and languages. We fine-tuned and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Software Engineering Research · Research Data Management Practices
