VisEval: A Benchmark for Data Visualization in the Era of Large Language   Models

Nan Chen; Yuge Zhang; Jiahang Xu; Kan Ren; Yuqing Yang

arXiv:2407.00981·cs.HC·August 8, 2024·1 cites

VisEval: A Benchmark for Data Visualization in the Era of Large Language Models

Nan Chen, Yuge Zhang, Jiahang Xu, Kan Ren, Yuqing Yang

PDF

Open Access 1 Repo

TL;DR

VisEval is a new benchmark that provides a large dataset and comprehensive evaluation methods to assess the ability of large language models to generate visualizations from natural language queries, addressing a key gap in the field.

Contribution

The paper introduces VisEval, a large-scale dataset and automated evaluation framework for natural language to visualization tasks, enabling systematic assessment of LLMs' visualization generation capabilities.

Findings

01

LLMs face significant challenges in visualization generation.

02

VisEval's evaluation reveals strengths and weaknesses of current LLMs.

03

The benchmark facilitates future research in NL2VIS.

Abstract

Translating natural language to visualization (NL2VIS) has shown great promise for visual data analysis, but it remains a challenging task that requires multiple low-level implementations, such as natural language processing and visualization design. Recent advancements in pre-trained large language models (LLMs) are opening new avenues for generating visualizations from natural language. However, the lack of a comprehensive and reliable benchmark hinders our understanding of LLMs' capabilities in visualization generation. In this paper, we address this gap by proposing a new NL2VIS benchmark called VisEval. Firstly, we introduce a high-quality and large-scale dataset. This dataset includes 2,524 representative queries covering 146 databases, paired with accurately labeled ground truths. Secondly, we advocate for a comprehensive automated evaluation methodology covering multiple…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/VisEval
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Data Quality and Management · Computational Physics and Python Applications