Descriptor: C++ Self-Admitted Technical Debt Dataset (CppSATD)
Phuoc Pham, Murali Sridharan, Matteo Esposito, Valentina Lenarduzzi

TL;DR
This paper introduces CppSATD, a large dataset of C++ source code comments explicitly admitting technical debt, addressing the gap in cross-language SATD research and enabling future detection and analysis methods.
Contribution
The creation of the first extensive C++ SATD dataset with over 531,000 annotated comments, facilitating cross-language SATD studies and detection techniques.
Findings
First large-scale C++ SATD dataset available
Enables cross-language SATD research and detection
Supports future development of SATD identification methods
Abstract
In software development, technical debt (TD) refers to suboptimal implementation choices made by the developers to meet urgent deadlines and limited resources, posing challenges for future maintenance. Self-Admitted Technical Debt (SATD) is a sub-type of TD, representing specific TD instances ``openly admitted'' by the developers and often expressed through source code comments. Previous research on SATD has focused predominantly on the Java programming language, revealing a significant gap in cross-language SATD. Such a narrow focus limits the generalizability of existing findings as well as SATD detection techniques across multiple programming languages. Our work addresses such limitation by introducing CppSATD, a dedicated C++ SATD dataset, comprising over 531,000 annotated comments and their source code contexts. Our dataset can serve as a foundation for future studies that aim to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Physics and Python Applications · Distributed and Parallel Computing Systems
MethodsFocus
