Investigating the Performance of Small Language Models in Detecting Test Smells in Manual Test Cases

Keila Lucas; Rohit Gheyi; M\'arcio Ribeiro; Fabio Palomba; Luana Martins; Elvys Soares

arXiv:2507.13035·cs.SE·July 18, 2025

Investigating the Performance of Small Language Models in Detecting Test Smells in Manual Test Cases

Keila Lucas, Rohit Gheyi, M\'arcio Ribeiro, Fabio Palomba, Luana Martins, Elvys Soares

PDF

Open Access

TL;DR

This paper explores the use of Small Language Models to automatically detect and explain test smells in manual test cases, demonstrating high accuracy and autonomous issue explanation capabilities.

Contribution

It introduces the application of SLMs for scalable, rule-free detection and explanation of test smells in real-world manual testing scenarios.

Findings

01

Phi-4 achieved 97% pass@2 in test smell detection.

02

SLMs can autonomously explain issues and suggest improvements.

03

SLMs enable low-cost, privacy-preserving test quality enhancement.

Abstract

Manual testing, in which testers follow natural language instructions to validate system behavior, remains crucial for uncovering issues not easily captured by automation. However, these test cases often suffer from test smells, quality issues such as ambiguity, redundancy, or missing checks that reduce test reliability and maintainability. While detection tools exist, they typically require manual rule definition and lack scalability. This study investigates the potential of Small Language Models (SLMs) for automatically detecting test smells. We evaluate Gemma3, Llama3.2, and Phi-4 on 143 real-world Ubuntu test cases, covering seven types of test smells. Phi-4 achieved the best results, reaching a pass@2 of 97% in detecting sentences with test smells, while Gemma3 and Llama3.2 reached approximately 91%. Beyond detection, SLMs autonomously explained issues and suggested improvements,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSentiment Analysis and Opinion Mining · AI in Service Interactions · Emotion and Mood Recognition