DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios

Jinxiang Meng; Shaoping Huang; Fangyu Lei; Jingyu Guo; Haoxiang Liu; Jiahao Su; Sihan Wang; Yao Wang; Enrui Wang; Ye Yang; Hongze Chai; Jinming Lv; Anbang Yu; Huangjing Zhang; Yitong Zhang; Yiming Huang; Zeyao Ma; Shizhu He; Jun Zhao; Kang Liu

arXiv:2604.25914·cs.CL·April 29, 2026

DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios

Jinxiang Meng, Shaoping Huang, Fangyu Lei, Jingyu Guo, Haoxiang Liu, Jiahao Su, Sihan Wang, Yao Wang, Enrui Wang, Ye Yang, Hongze Chai, Jinming Lv, Anbang Yu, Huangjing Zhang, Yitong Zhang, Yiming Huang, Zeyao Ma, Shizhu He, Jun Zhao, Kang Liu

PDF

1 Repo 1 Datasets

TL;DR

DV-World is a comprehensive benchmark with 260 tasks designed to evaluate data visualization agents in realistic, complex scenarios across multiple domains, highlighting current models' limitations.

Contribution

Introduces DV-World, a multi-domain benchmark with a hybrid evaluation framework to assess data visualization agents in real-world professional tasks.

Findings

01

State-of-the-art models score below 50% overall performance.

02

Benchmark exposes significant gaps in handling real-world visualization challenges.

03

Provides a realistic testbed to guide development of versatile visualization agents.

Abstract

Real-world data visualization (DV) requires native environmental grounding, cross-platform evolution, and proactive intent alignment. Yet, existing benchmarks often suffer from code-sandbox confinement, single-language creation-only tasks, and assumption of perfect intent. To bridge these gaps, we introduce DV-World, a benchmark of 260 tasks designed to evaluate DV agents across real-world professional lifecycles. DV-World spans three domains: DV-Sheet for native spreadsheet manipulation including chart and dashboard creation as well as diagnostic repair; DV-Evolution for adapting and restructuring reference visual artifacts to fit new data across diverse programming paradigms and DV-Interact for proactive intent alignment with a user simulator that mimics real-world ambiguous requirements. Our hybrid evaluation framework integrates Table-value Alignment for numerical precision and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

DA-Open/DV-World
github

Datasets

DV-World/dvworld
dataset· 1.5k dl
1.5k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.