TL;DR
VisCoder2 introduces a multi-language visualization coding agent supported by large datasets and benchmarks, significantly improving code generation, execution, and iterative debugging across 12 programming languages.
Contribution
The paper presents VisCode-Multi-679K, VisPlotBench, and VisCoder2, advancing multi-language visualization coding with large datasets, evaluation benchmarks, and a new model trained on extensive data.
Findings
VisCoder2 outperforms open-source baselines and approaches GPT-4 performance.
Iterative self-debugging improves execution pass rates.
Achieves 82.4% overall execution success at 32B scale.
Abstract
Large language models (LLMs) have recently enabled coding agents capable of generating, executing, and revising visualization code. However, existing models often fail in practical workflows due to limited language coverage, unreliable execution, and lack of iterative correction mechanisms. Progress has been constrained by narrow datasets and benchmarks that emphasize single-round generation and single-language tasks. To address these challenges, we introduce three complementary resources for advancing visualization coding agents. VisCode-Multi-679K is a large-scale, supervised dataset containing 679K validated and executable visualization samples with multi-turn correction dialogues across 12 programming languages. VisPlotBench is a benchmark for systematic evaluation, featuring executable tasks, rendered outputs, and protocols for both initial generation and multi-round self-debug.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
