How Far Have LLMs Come Toward Automated SATD Taxonomy Construction?

Sota Nakashima; Yuta Ishimoto; Masanari Kondo; Tao Xiao; Yasutaka Kamei

arXiv:2506.09601·cs.SE·October 24, 2025

How Far Have LLMs Come Toward Automated SATD Taxonomy Construction?

Sota Nakashima, Yuta Ishimoto, Masanari Kondo, Tao Xiao, Yasutaka Kamei

PDF

Open Access

TL;DR

This paper explores how large language models can assist in semi-automating the creation of SATD taxonomies across various software domains, significantly reducing manual effort and cost.

Contribution

It introduces a structured LLM-driven pipeline for SATD taxonomy construction and demonstrates its effectiveness across multiple domains with minimal time and expense.

Findings

01

Successfully recovered domain-specific categories

02

Completed taxonomy generation in under two hours

03

Cost less than $1 per dataset

Abstract

Technical debt refers to suboptimal code that degrades software quality. When developers intentionally introduce such debt, it is called self-admitted technical debt (SATD). Since SATD hinders maintenance, identifying its categories is key to uncovering quality issues. Traditionally, constructing such taxonomies requires manually inspecting SATD comments and surrounding code, which is time-consuming, labor-intensive, and often inconsistent due to annotator subjectivity. In this study, we investigated to what extent large language models (LLMs) could generate SATD taxonomies. We designed a structured, LLM-driven pipeline that mirrors the taxonomy construction steps researchers typically follow. We evaluated it on SATD datasets from three domains: quantum software, smart contracts, and machine learning. It successfully recovered domain-specific categories reported in prior work, such as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Software System Performance and Reliability