Evaluating Large Language Models in Detecting Test Smells
Keila Lucas, Rohit Gheyi, Elvys Soares, M\'arcio Ribeiro, Ivan, Machado

TL;DR
This paper evaluates the effectiveness of large language models like ChatGPT-4, Mistral Large, and Gemini Advanced in automatically detecting various test smells across multiple programming languages.
Contribution
It provides an empirical assessment of LLMs' capability to identify test smells, highlighting their potential as tools for improving software quality.
Findings
ChatGPT-4 identified 21 test smell types
Gemini Advanced identified 17 test smell types
Mistral Large detected 15 test smell types
Abstract
Test smells are coding issues that typically arise from inadequate practices, a lack of knowledge about effective testing, or deadline pressures to complete projects. The presence of test smells can negatively impact the maintainability and reliability of software. While there are tools that use advanced static analysis or machine learning techniques to detect test smells, these tools often require effort to be used. This study aims to evaluate the capability of Large Language Models (LLMs) in automatically detecting test smells. We evaluated ChatGPT-4, Mistral Large, and Gemini Advanced using 30 types of test smells across codebases in seven different programming languages collected from the literature. ChatGPT-4 identified 21 types of test smells. Gemini Advanced identified 17 types, while Mistral Large detected 15 types of test smells. Conclusion: The LLMs demonstrated potential as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Chemical Sensor Technologies · Advanced Text Analysis Techniques · Sentiment Analysis and Opinion Mining
