Beyond Strict Rules: Assessing the Effectiveness of Large Language Models for Code Smell Detection

Saymon Souza; Amanda Santana; Eduardo Figueiredo; Igor Muzetti; Jo\~ao Eduardo Montandon; Lionel Briand

arXiv:2601.09873·cs.SE·January 16, 2026

Beyond Strict Rules: Assessing the Effectiveness of Large Language Models for Code Smell Detection

Saymon Souza, Amanda Santana, Eduardo Figueiredo, Igor Muzetti, Jo\~ao Eduardo Montandon, Lionel Briand

PDF

Open Access

TL;DR

This study evaluates the effectiveness of large language models in detecting code smells in Java projects, proposing a combined approach with static analysis that improves detection performance for certain smells.

Contribution

It introduces an empirical evaluation of four LLMs for code smell detection and proposes a combined detection strategy that outperforms individual methods.

Findings

01

LLMs perform well on structurally simple code smells.

02

Different LLMs excel at different types of smells.

03

Combined strategies improve detection metrics for some smells.

Abstract

Code smells are symptoms of potential code quality problems that may affect software maintainability, thus increasing development costs and impacting software reliability. Large language models (LLMs) have shown remarkable capabilities for supporting various software engineering activities, but their use for detecting code smells remains underexplored. However, unlike the rigid rules of static analysis tools, LLMs can support flexible and adaptable detection strategies tailored to the unique properties of code smells. This paper evaluates the effectiveness of four LLMs -- DeepSeek-R1, GPT-5 mini, Llama-3.3, and Qwen2.5-Code -- for detecting nine code smells across 30 Java projects. For the empirical evaluation, we created a ground-truth dataset by asking 76 developers to manually inspect 268 code-smell candidates. Our results indicate that LLMs perform strongly for structurally…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Software Testing and Debugging Techniques