From Untestable to Testable: Metamorphic Testing in the Age of LLMs

Valerio Terragni

arXiv:2603.24774·cs.SE·March 27, 2026

From Untestable to Testable: Metamorphic Testing in the Age of LLMs

Valerio Terragni

PDF

Open Access

TL;DR

This paper explores how metamorphic testing can address the challenges of testing AI systems with large language models, which are powerful but unreliable and lack scalable ground truth labels.

Contribution

It introduces the application of metamorphic testing to LLMs, providing a scalable approach to test AI systems without relying on labeled ground truth.

Findings

01

Metamorphic testing effectively detects errors in LLMs.

02

The approach reduces reliance on labeled datasets.

03

It enhances the reliability of AI system testing.

Abstract

This article discusses the challenges of testing software systems with increasingly integrated AI and LLM functionalities. LLMs are powerful but unreliable, and labeled ground truth for testing rarely scales. Metamorphic Testing solves this by turning relations among multiple test executions into executable test oracles.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Testing and Debugging Techniques · Software System Performance and Reliability · Scientific Computing and Data Management