From Untestable to Testable: Metamorphic Testing in the Age of LLMs
Valerio Terragni

TL;DR
This paper explores how metamorphic testing can address the challenges of testing AI systems with large language models, which are powerful but unreliable and lack scalable ground truth labels.
Contribution
It introduces the application of metamorphic testing to LLMs, providing a scalable approach to test AI systems without relying on labeled ground truth.
Findings
Metamorphic testing effectively detects errors in LLMs.
The approach reduces reliance on labeled datasets.
It enhances the reliability of AI system testing.
Abstract
This article discusses the challenges of testing software systems with increasingly integrated AI and LLM functionalities. LLMs are powerful but unreliable, and labeled ground truth for testing rarely scales. Metamorphic Testing solves this by turning relations among multiple test executions into executable test oracles.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Software System Performance and Reliability · Scientific Computing and Data Management
