Deep Learning and Data Augmentation for Detecting Self-Admitted Technical Debt
Edi Sutoyo, Paris Avgeriou, Andrea Capiluppi

TL;DR
This paper enhances the detection and categorization of Self-Admitted Technical Debt using deep learning models and data augmentation, addressing dataset imbalance issues to improve performance across different artifact types.
Contribution
It introduces a two-step approach with data augmentation to improve SATD detection and categorization, providing a balanced dataset and demonstrating significant performance improvements.
Findings
Data augmentation improves SATD detection accuracy.
Two-step approach effectively categorizes different SATD types.
Balanced dataset benefits future SATD research.
Abstract
Self-Admitted Technical Debt (SATD) refers to circumstances where developers use textual artifacts to explain why the existing implementation is not optimal. Past research in detecting SATD has focused on either identifying SATD (classifying SATD items as SATD or not) or categorizing SATD (labeling instances as SATD that pertain to requirement, design, code, test debt, etc.). However, the performance of these approaches remains suboptimal, particularly for specific types of SATD, such as test and requirement debt, primarily due to extremely imbalanced datasets. To address these challenges, we build on earlier research by utilizing BiLSTM architecture for the binary identification of SATD and BERT architecture for categorizing different types of SATD. Despite their effectiveness, both architectures struggle with imbalanced data. Therefore, we employ a large language model data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFinancial Distress and Bankruptcy Prediction
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Sigmoid Activation · Tanh Activation · Long Short-Term Memory · Dropout · Bidirectional LSTM · Dense Connections · Layer Normalization · Residual Connection
