VisEval: A Benchmark for Data Visualization in the Era of Large Language Models
Nan Chen, Yuge Zhang, Jiahang Xu, Kan Ren, Yuqing Yang

TL;DR
VisEval is a new benchmark that provides a large dataset and comprehensive evaluation methods to assess the ability of large language models to generate visualizations from natural language queries, addressing a key gap in the field.
Contribution
The paper introduces VisEval, a large-scale dataset and automated evaluation framework for natural language to visualization tasks, enabling systematic assessment of LLMs' visualization generation capabilities.
Findings
LLMs face significant challenges in visualization generation.
VisEval's evaluation reveals strengths and weaknesses of current LLMs.
The benchmark facilitates future research in NL2VIS.
Abstract
Translating natural language to visualization (NL2VIS) has shown great promise for visual data analysis, but it remains a challenging task that requires multiple low-level implementations, such as natural language processing and visualization design. Recent advancements in pre-trained large language models (LLMs) are opening new avenues for generating visualizations from natural language. However, the lack of a comprehensive and reliable benchmark hinders our understanding of LLMs' capabilities in visualization generation. In this paper, we address this gap by proposing a new NL2VIS benchmark called VisEval. Firstly, we introduce a high-quality and large-scale dataset. This dataset includes 2,524 representative queries covering 146 databases, paired with accurately labeled ground truths. Secondly, we advocate for a comprehensive automated evaluation methodology covering multiple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Data Quality and Management · Computational Physics and Python Applications
