A First Look at Duplicate and Near-duplicate Self-admitted Technical Debt Comments
Jerin Yasmin, Mohammad Sadegh Sheikhaei, Yuan Tian

TL;DR
This study investigates duplicate and near-duplicate self-admitted technical debt comments in open-source projects, developing a method to identify and analyze their characteristics, evolution, and relation to code clones.
Contribution
It introduces a novel automated approach to detect duplicate SATD comments and provides an empirical analysis of their characteristics and evolution in popular OSS projects.
Findings
Only 48.5% of duplicate SATD groups with the same root cause are in code clones.
33.9% of duplicate SATD pairs are introduced in the same commit.
Identified 3,520 duplicate and near-duplicate SATD comments across five projects.
Abstract
Self-admitted technical debt (SATD) refers to technical debt that is intentionally introduced by developers and explicitly documented in code comments or other software artifacts (e.g., issue reports) to annotate sub-optimal decisions made by developers in the software development process. In this work, we take the first look at the existence and characteristics of duplicate and near-duplicate SATD comments in five popular Apache OSS projects, i.e., JSPWiki, Helix, Jackrabbit, Archiva, and SystemML. We design a method to automatically identify groups of duplicate and near-duplicate SATD comments and track their evolution in the software system by mining the commit history of a software project. Leveraging the proposed method, we identified 3,520 duplicate and near-duplicate SATD comments from the target projects, which belong to 1,141 groups. We manually analyze the content and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
