Grid2Matrix: Revealing Digital Agnosia in Vision-Language Models

Yunkai Zhang; Linda Li; Yingxin Cui; Xiyuan Ruan; Zeyu Zheng; Kezhen Chen; Yi Zhang; Diji Yang

arXiv:2604.09687·cs.CV·April 16, 2026

Grid2Matrix: Revealing Digital Agnosia in Vision-Language Models

Yunkai Zhang, Linda Li, Yingxin Cui, Xiyuan Ruan, Zeyu Zheng, Kezhen Chen, Yi Zhang, Diji Yang

PDF

1 Repo

TL;DR

Grid2Matrix (G2M) is a new benchmark revealing that vision-language models often fail to faithfully capture all visual details, especially in complex grids, exposing a gap called Digital Agnosia.

Contribution

The paper introduces G2M, a controlled benchmark to analyze visual detail retention in VLMs and uncovers a systematic failure mode called Digital Agnosia.

Findings

01

VLMs fail early on small grids in zero-shot evaluation.

02

Visual encoders retain more information than end-to-end outputs.

03

Failures depend on grid cell overlap with visual patches.

Abstract

Vision-Language Models (VLMs) excel on many multimodal reasoning benchmarks, but these evaluations often do not require an exhaustive readout of the image and can therefore obscure failures in faithfully capturing all visual details. We introduce Grid2Matrix (G2M), a controlled benchmark in which a model is shown a color grid and a color-to-number mapping, and must output the corresponding matrix. By varying grid size and the number of colors, G2M provides a simple way to increase visual complexity while minimizing semantic confounds. We find that VLMs exhibit a sharp early collapse in zero-shot end-to-end evaluation, failing on surprisingly small grids rather than degrading gradually as the task becomes denser. We probe the visual encoders of VLMs from two representative families and find that they preserve substantially more of the grid information than the corresponding end-to-end…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhykoties/Grid2Matrix_DigitalAgnosia
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.