More Than Meets the Eye: Measuring the Semiotic Gap in Vision-Language Models via Semantic Anchorage

Wei He

arXiv:2604.17354·cs.CL·April 21, 2026

More Than Meets the Eye: Measuring the Semiotic Gap in Vision-Language Models via Semantic Anchorage

Wei He

PDF

TL;DR

This paper introduces DIVA, a benchmark and metric to evaluate how vision-language models handle abstract idiomatic meanings versus literal interpretations, revealing a bias towards literal grounding especially with high visual fidelity.

Contribution

The paper proposes DIVA and Semantic Alignment Gap as tools to measure and analyze the semiotic gap in VLMs, highlighting the impact of visual fidelity on symbolic understanding.

Findings

01

Models show a literal superiority bias regardless of scale.

02

Increased visual fidelity weakens symbolic alignment.

03

Iconographic abstraction improves compositional understanding.

Abstract

Vision-Language Models (VLMs) excel at photorealistic generation, yet often struggle to represent abstract meaning such as idiomatic interpretations of noun compounds. To study whether high visual fidelity interferes with idiomatic compositionality under visual abstraction, we introduce DIVA, a controlled benchmark that replaces high-fidelity visual detail with schematic iconicity by generating paired, sense-anchored visualizations for literal and idiomatic readings. We further propose Semantic Alignment Gap ( $Δ$ ), an architecture-agnostic metric that quantifies divergence between literal and idiomatic visual grounding. We additionally introduce a directional signed bias $b (t)$ to separately measure the direction and strength of literal preference. Evaluating 8 recent VLMs, we reveal a consistent Literal Superiority Bias: model scale alone does not resolve literal preference, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.