V-tableR1: Process-Supervised Multimodal Table Reasoning with Critic-Guided Policy Optimization

Yubo Jiang; Yitong An; Xin Yang; Abudukelimu Wuerkaixi; Xuxin Cheng; Fengying Xie; Zhiguo Jiang; Cao Liu; Ke Zeng; Haopeng Zhang

arXiv:2604.20755·cs.AI·April 23, 2026

V-tableR1: Process-Supervised Multimodal Table Reasoning with Critic-Guided Policy Optimization

Yubo Jiang, Yitong An, Xin Yang, Abudukelimu Wuerkaixi, Xuxin Cheng, Fengying Xie, Zhiguo Jiang, Cao Liu, Ke Zeng, Haopeng Zhang

PDF

1 Repo

TL;DR

V-tableR1 introduces a process-supervised reinforcement learning framework that enhances multimodal large language models' reasoning on tables by providing step-level visual feedback and a novel optimization algorithm, achieving state-of-the-art accuracy.

Contribution

It presents V-tableR1, a new RL framework with a critic-guided policy optimization method that improves verifiable reasoning in multimodal models on tabular data.

Findings

01

V-tableR1 outperforms larger models on tabular benchmarks.

02

It reduces visual hallucinations and shortcut reasoning.

03

Achieves state-of-the-art accuracy among open-source models.

Abstract

We introduce V-tableR1, a process-supervised reinforcement learning framework that elicits rigorous, verifiable reasoning from multimodal large language models (MLLMs). Current MLLMs trained solely on final outcomes often treat visual reasoning as a black box, relying on superficial pattern matching rather than performing rigorous multi-step inference. While Reinforcement Learning with Verifiable Rewards could enforce transparent reasoning trajectories, extending it to visual domains remains severely hindered by the ambiguity of grounding abstract logic into continuous pixel space. We solve this by leveraging the deterministic grid structure of tables as an ideal visual testbed. V-tableR1 employs a specialized critic VLM to provide dense, step-level feedback on the explicit visual chain-of-thought generated by a policy VLM. To optimize this system, we propose Process-Guided Direct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

arxiv-to-code/arxiv-260420755-v-tabler1-process-supervised-multimodal-table-reasoning-with
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.