Search-based Selection of Metamorphic Relations for Optimized Robustness Testing of Large Language Models

Sangwon Hyun; Shaukat Ali; M. Ali Babar

arXiv:2507.05565·cs.SE·July 9, 2025

Search-based Selection of Metamorphic Relations for Optimized Robustness Testing of Large Language Models

Sangwon Hyun, Shaukat Ali, M. Ali Babar

PDF

Open Access

TL;DR

This paper introduces a search-based method to optimize the selection of metamorphic relations for testing large language models, improving failure detection and expanding test coverage through combinatorial perturbations.

Contribution

It proposes a novel search approach with four algorithms to select effective MRs, covering combinatorial perturbations, and demonstrates superior performance of MOEA/D in LLM robustness testing.

Findings

01

MOEA/D outperformed other algorithms in MR optimization.

02

Identified 'silver bullet' MRs that effectively confuse LLMs.

03

Expanded test space with combinatorial perturbations enhances robustness assessment.

Abstract

Assessing the trustworthiness of Large Language Models (LLMs), such as robustness, has garnered significant attention. Recently, metamorphic testing that defines Metamorphic Relations (MRs) has been widely applied to evaluate the robustness of LLM executions. However, the MR-based robustness testing still requires a scalable number of MRs, thereby necessitating the optimization of selecting MRs. Most extant LLM testing studies are limited to automatically generating test cases (i.e., MRs) to enhance failure detection. Additionally, most studies only considered a limited test space of single perturbation MRs in their evaluation of LLMs. In contrast, our paper proposes a search-based approach for optimizing the MR groups to maximize failure detection and minimize the LLM execution cost. Moreover, our approach covers the combinatorial perturbations in MRs, facilitating the expansion of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Ethics and Social Impacts of AI · Explainable Artificial Intelligence (XAI)