An Empirical Study of Self-Admitted Technical Debt in Machine Learning Software
Aaditya Bhatia, Foutse Khomh, Bram Adams, Ahmed E Hassan

TL;DR
This study investigates self-admitted technical debt in machine learning software, revealing that ML projects have higher and earlier-developing technical debt compared to non-ML projects, especially in data and model components.
Contribution
It provides the first empirical analysis of SATD in ML projects, comparing it with non-ML projects and analyzing its evolution and characteristics.
Findings
ML projects have twice the SATD percentage of non-ML projects.
Debt is more common in data preprocessing and model generation components.
Long-lasting SATDs are linked to extensive, low-complexity code changes.
Abstract
The emergence of open-source ML libraries such as TensorFlow and Google Auto ML has enabled developers to harness state-of-the-art ML algorithms with minimal overhead. However, during this accelerated ML development process, said developers may often make sub-optimal design and implementation decisions, leading to the introduction of technical debt that, if not addressed promptly, can have a significant impact on the quality of the ML-based software. Developers frequently acknowledge these sub-optimal design and development choices through code comments during software development. These comments, which often highlight areas requiring additional work or refinement in the future, are known as self-admitted technical debt (SATD). This paper aims to investigate SATD in ML code by analyzing 318 open-source ML projects across five domains, along with 318 non-ML projects. We detected SATD in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Machine Learning and Data Classification · Software Reliability and Analysis Research
