SketchVL: Policy Optimization via Fine-Grained Credit Assignment for Chart Understanding and More

Muye Huang; Lingling Zhang; Yifei Li; Yaqiang Wu; Jun Liu

arXiv:2601.05688·cs.CV·January 12, 2026

SketchVL: Policy Optimization via Fine-Grained Credit Assignment for Chart Understanding and More

Muye Huang, Lingling Zhang, Yifei Li, Yaqiang Wu, Jun Liu

PDF

Open Access

TL;DR

SketchVL introduces a fine-grained reinforcement learning approach with intermediate reasoning annotations, significantly improving automated chart understanding by better credit assignment during training.

Contribution

It proposes SketchVL, a novel multimodal language model trained with FinePO, a new RL algorithm for fine-grained credit assignment at each reasoning step.

Findings

01

Achieves 7.23% performance improvement over base models.

02

Effectively aligns step-level reasoning with FinePRM scores.

03

Demonstrates robustness across chart, natural image, and math datasets.

Abstract

Charts are high-density visual carriers of complex data and medium for information extraction and analysis. Due to the need for precise and complex visual reasoning, automated chart understanding poses a significant challenge to existing Multimodal Large Language Models (MLLMs). Many MLLMs trained with reinforcement learning (RL) face the challenge of credit assignment. Their advantage estimation, typically performed at the trajectory level, cannot distinguish between correct and incorrect reasoning steps within a single generated response. To address this limitation, we introduce SketchVL, a novel MLLM that optimized with FinePO, a new RL algorithm designed for fine-grained credit assignment within each trajectory. SketchVL's methodology involves drawing its intermediate reasoning steps as markers on the image and feeding the annotated image back to itself, creating a robust,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Data Visualization and Analytics · Explainable Artificial Intelligence (XAI)