Aligning Text, Code, and Vision: A Multi-Objective Reinforcement Learning Framework for Text-to-Visualization
Mizanur Rahman, Mohammed Saidul Islam, Md Tahmid Rahman Laskar, Shafiq Joty, and Enamul Hoque

TL;DR
This paper introduces RL-Text2Vis, a reinforcement learning framework that improves text-to-visualization generation by jointly optimizing accuracy, code validity, and visualization quality, significantly outperforming existing methods.
Contribution
It presents the first reinforcement learning approach for Text2Vis, using a multi-objective reward and GRPO to enhance visualization quality and code execution success.
Findings
22% relative improvement in chart quality over GPT-4o
Code execution success increased from 78% to 97%
Robust generalization to out-of-domain datasets
Abstract
Text-to-Visualization (Text2Vis) systems translate natural language queries over tabular data into concise answers and executable visualizations. While closed-source LLMs generate functional code, the resulting charts often lack semantic alignment and clarity, qualities that can only be assessed post-execution. Open-source models struggle even more, frequently producing non-executable or visually poor outputs. Although supervised fine-tuning can improve code executability, it fails to enhance overall visualization quality, as traditional SFT loss cannot capture post-execution feedback. To address this gap, we propose RL-Text2Vis, the first reinforcement learning framework for Text2Vis generation. Built on Group Relative Policy Optimization (GRPO), our method uses a novel multi-objective reward that jointly optimizes textual accuracy, code validity, and visualization quality using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsData Visualization and Analytics · Multimodal Machine Learning Applications · Topic Modeling
