LLM-Based Detection of Tangled Code Changes for Higher-Quality Method-Level Bug Datasets
Md Nahidul Islam Opu, Shaowei Wang, Shaiful Chowdhury

TL;DR
This paper explores using Large Language Models to detect tangled code changes at the method level, improving bug dataset quality and aiding bug prediction models.
Contribution
It introduces a novel LLM-based approach for fine-grained detection of tangled commits using commit messages and code diffs, with state-of-the-art performance.
Findings
Combining commit messages with code diffs boosts detection accuracy.
Few-shot and chain-of-thought prompting achieve high F1-scores (up to 0.883).
ML models trained on LLM embeddings further improve performance (F1-score 0.906).
Abstract
Tangled code changes, commits that conflate unrelated modifications such as bug fixes, refactorings, and enhancements, introduce significant noise into bug datasets and adversely affect the performance of bug prediction models. Addressing this issue at a fine-grained, method-level granularity remains unexplored. This is critical to address, as recent bug prediction models, driven by practitioner demand, are increasingly focusing on finer granularity rather than traditional class- or file-level predictions. This study investigates the utility of Large Language Models (LLMs) for detecting tangled code changes by leveraging both commit messages and method-level code diffs. We formulate the problem as a binary classification task and evaluate multiple prompting strategies, including zero-shot, few-shot, and chain-of-thought prompting, using state-of-the-art proprietary LLMs such as GPT-5…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
