PlotGen-Bench: Evaluating VLMs on Generating Visualization Code from Diverse Plots across Multiple Libraries

Yi Zhao; Zhen Yang; Shuaiqi Duan; Wenmeng Yu; Zhe Su; Jibing Gong; Jie Tang

arXiv:2601.11525·cs.HC·January 21, 2026

PlotGen-Bench: Evaluating VLMs on Generating Visualization Code from Diverse Plots across Multiple Libraries

Yi Zhao, Zhen Yang, Shuaiqi Duan, Wenmeng Yu, Zhe Su, Jibing Gong, Jie Tang

PDF

Open Access

TL;DR

This paper introduces PlotGen-Bench, a comprehensive benchmark for evaluating vision-language models on their ability to generate complex, multi-library visualization code, revealing significant gaps in current models' visual fidelity and reasoning capabilities.

Contribution

The paper presents a new benchmark, PlotGen-Bench, for assessing VLMs on complex visualization code generation across diverse scenarios and libraries, highlighting current model limitations.

Findings

01

Open-source models lag in visual fidelity and semantic accuracy.

02

Models perform poorly on reasoning-intensive tasks like chart conversion.

03

Benchmark provides a foundation for improving VLMs in visualization tasks.

Abstract

Recent advances in vision-language models (VLMs) have expanded their multimodal code generation capabilities, yet their ability to generate executable visualization code from plots, especially for complex 3D, animated, plot-to-plot transformations, or multi-library scenarios, remains underexplored. To address this gap, we introduce PlotGen-Bench, a comprehensive benchmark for evaluating plot-to-code generation under realistic and complex visualization scenarios. The benchmark spans 9 major categories, 30 subcategories, and 3 core tasks-plot replication, plot transformation, and multi-library generation, covering both 2D, 3D and animated plots across 5 widely used visualization libraries. Through systematic evaluation of state-of-the-art open- and closed-source VLMs, we find that open-source models still lag considerably behind in visual fidelity and semantic consistency, despite…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Visualization and Analytics · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis