Automatic Detection and Analysis of Technical Debts in Peer-Review Documentation of R Packages
Junaed Younus Khan, Gias Uddin

TL;DR
This paper develops machine learning models, especially BERT, to automatically detect and analyze technical debts in R package peer-review documentation, revealing documentation debt as most prevalent and rapidly expanding.
Contribution
The paper introduces a novel ML-based approach for automatic detection of 10 types of technical debt in R package documentation, filling a gap in existing research.
Findings
Deep ML models achieve F1-scores of 0.71-0.91 in TD detection.
Documentation debt is the most prevalent and rapidly expanding TD.
R packages on the general platform are more prone to TD than domain-specific packages.
Abstract
Technical debt (TD) is a metaphor for code-related problems that arise as a result of prioritizing speedy delivery over perfect code. Given that the reduction of TDs can have long-term positive impact in the software engineering life-cycle (SDLC), TDs are studied extensively in the literature. However, very few of the existing research focused on the technical debts of R programming language despite its popularity and usage. Recent research by Codabux et al. [21] finds that R packages can have 10 diverse TD types analyzing peer-review documentation. However, the findings are based on the manual analysis of a small sample of R package review comments. In this paper, we develop a suite of Machine Learning (ML) classifiers to detect the 10 TDs automatically. The best performing classifier is based on the deep ML model BERT, which achieves F1-scores of 0.71 - 0.91. We then apply the trained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Scientific Computing and Data Management
