Chart-RL: Policy Optimization Reinforcement Learning for Enhanced Visual Reasoning in Chart Question Answering with Vision Language Models
Yunfei Bai, Amit Dhanda, Shekhar Jain

TL;DR
This paper introduces Chart-RL, a reinforcement learning framework that improves visual reasoning in chart question answering by optimizing policy and perception, achieving higher accuracy and efficiency.
Contribution
It presents a novel RL-based approach with parameter-efficient fine-tuning for enhanced chart understanding in vision-language models.
Findings
RL fine-tuning improved answer accuracy to 0.634 from 0.580.
Reduced inference time from 31s to 9s with smaller models.
Outperformed baseline models on the ChartQAPro dataset.
Abstract
The recent advancements in Vision Language Models (VLMs) have demonstrated progress toward true intelligence requiring robust reasoning capabilities. Beyond pattern recognition, linguistic reasoning must integrate with visual comprehension, particularly for Chart Question Answering (CQA) tasks involving complex data visualizations. Current VLMs face significant limitations in CQA, including imprecise numerical extraction, difficulty interpreting implicit visual relationships, and inadequate attention mechanisms for capturing spatial relationships in charts. In this work, we address these challenges by presenting Chart-RL, a novel reinforcement learning framework that enhances VLMs chart understanding through feedback-driven policy optimization of visual perception and logical inference. Our key innovation includes a comprehensive framework integrating Reinforcement Learning (RL) from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
