VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation
Yuansheng Ni, Ping Nie, Kai Zou, Xiang Yue, Wenhu Chen

TL;DR
VisCoder is a fine-tuned LLM trained on a large dataset of Python visualization code and correction dialogues, significantly improving the reliability and accuracy of plot generation through execution-grounded supervision and iterative self-correction.
Contribution
The paper introduces VisCode-200K, a large dataset for instruction tuning LLMs in visualization, and demonstrates how fine-tuning on this dataset enhances code correctness and visual accuracy.
Findings
VisCoder outperforms open-source baselines on visualization tasks.
Self-correction with runtime feedback improves code reliability.
Approaches near GPT-4o-mini performance on benchmark.
Abstract
Large language models (LLMs) often struggle with visualization tasks like plotting diagrams, charts, where success depends on both code correctness and visual semantics. Existing instruction-tuning datasets lack execution-grounded supervision and offer limited support for iterative code correction, resulting in fragile and unreliable plot generation. We present VisCode-200K, a large-scale instruction tuning dataset for Python-based visualization and self-correction. It contains over 200K examples from two sources: (1) validated plotting code from open-source repositories, paired with natural language instructions and rendered plots; and (2) 45K multi-turn correction dialogues from Code-Feedback, enabling models to revise faulty code using runtime feedback. We fine-tune Qwen2.5-Coder-Instruct on VisCode-200K to create VisCoder, and evaluate it on PandasPlotBench. VisCoder significantly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Computational Physics and Python Applications · Data Visualization and Analytics
MethodsADaptive gradient method with the OPTimal convergence rate
