Beyond Pixels: Introspective and Interactive Grounding for Visualization Agents

Yiyang Lu; Woong Shin; Ahmad Maroof Karimi; Feiyi Wang; Jie Ren; Evgenia Smirni

arXiv:2604.21134·cs.CL·April 24, 2026

Beyond Pixels: Introspective and Interactive Grounding for Visualization Agents

Yiyang Lu, Woong Shin, Ahmad Maroof Karimi, Feiyi Wang, Jie Ren, Evgenia Smirni

PDF

TL;DR

This paper introduces IVG, a framework combining spec-based introspection and view-based interaction to improve visualization understanding and question answering accuracy in chart analysis.

Contribution

It proposes a novel framework that integrates structured specification querying and view manipulation, enhancing visual grounding beyond pixel interpretation.

Findings

01

Introspection improves data reconstruction fidelity.

02

Interaction combined with introspection achieves 81% QA accuracy.

03

+6.7% gains on overlapping geometries in visual questions.

Abstract

Vision-Language Models (VLMs) frequently misread values, hallucinate details, and confuse overlapping elements in charts. Current approaches rely solely on pixel interpretation, creating a Pixel-Only Bottleneck: agents treat interactive charts as static images, losing access to the structured specification that encodes exact values. We introduce Introspective and Interactive Visual Grounding (IVG), a framework that combines (1) spec-grounded introspection, which queries the underlying specification for deterministic evidence, with (2) view-grounded interaction, which manipulates the view to resolve visual ambiguity. To enable evaluation without VLM bias, we present iPlotBench, a benchmark of 500 interactive Plotly figures with 6,706 binary questions and ground-truth specifications. Experiments show that introspection improves data reconstruction fidelity, while the combination with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.