Investigating the Performance of Small Language Models in Detecting Test Smells in Manual Test Cases
Keila Lucas, Rohit Gheyi, M\'arcio Ribeiro, Fabio Palomba, Luana Martins, Elvys Soares

TL;DR
This paper explores the use of Small Language Models to automatically detect and explain test smells in manual test cases, demonstrating high accuracy and autonomous issue explanation capabilities.
Contribution
It introduces the application of SLMs for scalable, rule-free detection and explanation of test smells in real-world manual testing scenarios.
Findings
Phi-4 achieved 97% pass@2 in test smell detection.
SLMs can autonomously explain issues and suggest improvements.
SLMs enable low-cost, privacy-preserving test quality enhancement.
Abstract
Manual testing, in which testers follow natural language instructions to validate system behavior, remains crucial for uncovering issues not easily captured by automation. However, these test cases often suffer from test smells, quality issues such as ambiguity, redundancy, or missing checks that reduce test reliability and maintainability. While detection tools exist, they typically require manual rule definition and lack scalability. This study investigates the potential of Small Language Models (SLMs) for automatically detecting test smells. We evaluate Gemma3, Llama3.2, and Phi-4 on 143 real-world Ubuntu test cases, covering seven types of test smells. Phi-4 achieved the best results, reaching a pass@2 of 97% in detecting sentences with test smells, while Gemma3 and Llama3.2 reached approximately 91%. Beyond detection, SLMs autonomously explained issues and suggested improvements,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · AI in Service Interactions · Emotion and Mood Recognition
