Same Answer, Different Representations: Hidden instability in VLMs

Farooq Ahmad Wani; Alessandro Suglia; Rohit Saxena; Aryo Pradipta Gema; Wai-Chung Kwan; Fazl Barez; Maria Sofia Bucarelli; Fabrizio Silvestri; Pasquale Minervini

arXiv:2602.06652·cs.AI·February 9, 2026

Same Answer, Different Representations: Hidden instability in VLMs

Farooq Ahmad Wani, Alessandro Suglia, Rohit Saxena, Aryo Pradipta Gema, Wai-Chung Kwan, Fazl Barez, Maria Sofia Bucarelli, Fabrizio Silvestri, Pasquale Minervini

PDF

Open Access

TL;DR

This paper introduces a new evaluation framework for Vision Language Models that assesses internal representation stability and reveals that larger models are not necessarily more robust, with internal drift often occurring despite stable outputs.

Contribution

The work presents a representation-aware and frequency-aware evaluation method for VLMs, uncovering internal instability and failure modes not captured by output-level metrics.

Findings

01

Models often show internal representation drift despite stable predictions.

02

Larger models do not necessarily have improved robustness; they can be more sensitive.

03

Perturbations impact reasoning and hallucination differently, sometimes reducing false positives.

Abstract

The robustness of Vision Language Models (VLMs) is commonly assessed through output-level invariance, implicitly assuming that stable predictions reflect stable multimodal processing. In this work, we argue that this assumption is insufficient. We introduce a representation-aware and frequency-aware evaluation framework that measures internal embedding drift, spectral sensitivity, and structural smoothness (spatial consistency of vision tokens), alongside standard label-based metrics. Applying this framework to modern VLMs across the SEEDBench, MMMU, and POPE datasets reveals three distinct failure modes. First, models frequently preserve predicted answers while undergoing substantial internal representation drift; for perturbations such as text overlays, this drift approaches the magnitude of inter-image variability, indicating that representations move to regions typically occupied by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Language, Metaphor, and Cognition · Categorization, perception, and language