SmellBench: Evaluating LLM Agents on Architectural Code Smell Repair

Ion George Dinu; Marian Cristian Mih\u{a}escu; Traian Rebedea

arXiv:2605.07001·cs.SE·May 13, 2026

SmellBench: Evaluating LLM Agents on Architectural Code Smell Repair

Ion George Dinu, Marian Cristian Mih\u{a}escu, Traian Rebedea

PDF

1 Repo

TL;DR

This paper evaluates large language models on their ability to repair architectural code smells, introducing SmellBench, a framework for systematic assessment and revealing current limitations in cross-module refactoring.

Contribution

It presents SmellBench, a novel framework for evaluating LLMs on architectural smell repair, including optimized prompts and a comprehensive scoring methodology.

Findings

01

63.1% of detected smells are false positives

02

Best agent resolves 47.7% of true smells

03

Most aggressive agent introduces 140 new smells

Abstract

Architectural code smells erode software maintainability and are costly to repair manually, yet unlike localized bugs, they require cross-module reasoning about design intent that challenges both developers and automated tools. While large language model agents excel at bug fixing and code-level refactoring, their ability to repair architectural code smells remains unexplored. We present the first empirical evaluation of LLM agents on architectural code smell repair. We contribute SmellBench, a task orchestration framework that incorporates smell-type-specific optimized prompts and supports iterative multi-step execution, together with a scoring methodology that separately evaluates repair effectiveness, false positive identification, and net codebase impact. We evaluate 11 agent configurations from four model families (GPT, Claude, Gemini, Mistral) on 65 hard-severity architectural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://doi.org/10.5281/zenodo.19247588
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.