TL;DR
This paper investigates Keyword-Labeled SATD (KL-SATD) in source code comments, analyzing its prevalence, content, and developing a machine learning classifier to automatically detect and identify missing SATD keywords, aiding in technical debt management.
Contribution
It provides the first large-scale analysis of KL-SATD, compares its content to manually labeled SATD, and introduces a logistic Lasso regression model for effective automatic detection.
Findings
KL-SATD comments constitute about 1.52% of all comments.
KL-SATD comments contain words indicating code changes and uncertainty.
The classifier achieves an AUC-ROC of 0.88 in detecting KL-SATD.
Abstract
When developers use different keywords such as TODO and FIXME in source code comments to describe self-admitted technical debt (SATD), we refer it as Keyword-Labeled SATD (KL-SATD). We study KL-SATD from 33 software repositories with 13,588 KL-SATD comments. We find that the median percentage of KL-SATD comments among all comments is only 1,52%. We find that KL-SATD comment contents include words expressing code changes and uncertainty, such as remove, fix, maybe and probably. This makes them different compared to other comments. KL-SATD comment contents are similar to manually labeled SATD comments of prior work. Our machine learning classifier using logistic Lasso regression has good performance in detecting KL-SATD comments (AUC-ROC 0.88). Finally, we demonstrate that using machine learning we can identify comments that are currently missing but which should have a SATD keyword in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
