Semantic Invariance in Agentic AI

I. de Zarz\`a; J. de Curt\`o; Jordi Cabot; Pietro Manzoni; Carlos T. Calafate

arXiv:2603.13173·cs.AI·March 17, 2026

Semantic Invariance in Agentic AI

I. de Zarz\`a, J. de Curt\`o, Jordi Cabot, Pietro Manzoni, Carlos T. Calafate

PDF

Open Access

TL;DR

This paper introduces a metamorphic testing framework to evaluate the semantic invariance of large language models acting as reasoning agents, revealing that larger models are not necessarily more robust to input variations.

Contribution

We develop a systematic testing method using semantic-preserving transformations to assess LLM reasoning robustness across multiple models and domains.

Findings

01

Smaller models like Qwen3-30B-A3B show higher robustness (79.6%) than larger models.

02

Model scale does not correlate with increased robustness.

03

Semantic invariance varies significantly across models and transformations.

Abstract

Large Language Models (LLMs) increasingly serve as autonomous reasoning agents in decision support, scientific problem-solving, and multi-agent coordination systems. However, deploying LLM agents in consequential applications requires assurance that their reasoning remains stable under semantically equivalent input variations, a property we term semantic invariance. Standard benchmark evaluations, which assess accuracy on fixed, canonical problem formulations, fail to capture this critical reliability dimension. To address this shortcoming, in this paper we present a metamorphic testing framework for systematically assessing the robustness of LLM reasoning agents, applying eight semantic-preserving transformations (identity, paraphrase, fact reordering, expansion, contraction, academic context, business context, and contrastive formulation) across seven foundation models spanning four…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Multimodal Machine Learning Applications