Testing the Untestable? An Empirical Study on the Testing Process of LLM-Powered Software Systems
Cleyton Magalhaes, Italo Santos, Brody Stuart-Verner, Ronnie de Souza Santos

TL;DR
This paper presents an empirical study on how developers test large language model-powered software systems, revealing unique challenges and adapted testing strategies in real-world application development.
Contribution
It provides the first detailed empirical analysis of testing practices and challenges specific to LLM-integrated systems in practical settings.
Findings
Testing combines manual and automated methods.
Common practices include exploratory testing and prompt iteration.
Challenges include hallucinations and unpredictability.
Abstract
Background: Software systems powered by large language models are becoming a routine part of everyday technologies, supporting applications across a wide range of domains. In software engineering, many studies have focused on how LLMs support tasks such as code generation, debugging, and documentation. However, there has been limited focus on how full systems that integrate LLMs are tested during development. Aims: This study explores how LLM-powered systems are tested in the context of real-world application development. Method: We conducted an exploratory case study using 99 individual reports written by students who built and deployed LLM-powered applications as part of a university course. Each report was independently analyzed using thematic analysis, supported by a structured coding process. Results: Testing strategies combined manual and automated methods to evaluate both system…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Techniques and Practices · Software Engineering Research · Software Testing and Debugging Techniques
