Text2Vis: A Challenging and Diverse Benchmark for Generating Multimodal Visualizations from Text

Mizanur Rahman; Md Tahmid Rahman Laskar; Shafiq Joty; Enamul Hoque

arXiv:2507.19969·cs.CL·July 29, 2025

Text2Vis: A Challenging and Diverse Benchmark for Generating Multimodal Visualizations from Text

Mizanur Rahman, Md Tahmid Rahman Laskar, Shafiq Joty, Enamul Hoque

PDF

1 Datasets 1 Video

TL;DR

Text2Vis introduces a comprehensive benchmark for evaluating text-to-visualization models across diverse chart types and data queries, highlighting current challenges and proposing a novel agent-critic framework to improve model performance.

Contribution

The paper presents the first extensive benchmark for text-to-visualization, along with a new cross-modal agent-critic framework and automated evaluation methods to advance the field.

Findings

01

Significant performance gaps among existing models.

02

The proposed framework improves visualization quality and answer accuracy.

03

Automated evaluation enables scalable assessment without human annotation.

Abstract

Automated data visualization plays a crucial role in simplifying data interpretation, enhancing decision-making, and improving efficiency. While large language models (LLMs) have shown promise in generating visualizations from natural language, the absence of comprehensive benchmarks limits the rigorous evaluation of their capabilities. We introduce Text2Vis, a benchmark designed to assess text-to-visualization models, covering 20+ chart types and diverse data science queries, including trend analysis, correlation, outlier detection, and predictive analytics. It comprises 1,985 samples, each with a data table, natural language query, short answer, visualization code, and annotated charts. The queries involve complex reasoning, conversational turns, and dynamic data retrieval. We benchmark 11 open-source and closed-source models, revealing significant performance gaps, highlighting key…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

mizanurr/Text2Vis
dataset· 34 dl
34 dl

Videos

Text2Vis: A Challenging and Diverse Benchmark for Generating Multimodal Visualizations from Text· underline