PROPA: Toward Process-level Optimization in Visual Reasoning via Reinforcement Learning

Yanbei Jiang; Chao Lei; Yihao Ding; Krista Ehinger; Jey Han Lau

arXiv:2511.10279·cs.CV·November 14, 2025

PROPA: Toward Process-level Optimization in Visual Reasoning via Reinforcement Learning

Yanbei Jiang, Chao Lei, Yihao Ding, Krista Ehinger, Jey Han Lau

PDF

Open Access

TL;DR

PROPA introduces a process-level optimization framework combining Monte Carlo Tree Search and reinforcement learning to enhance multi-step visual reasoning in vision-language models, achieving significant performance improvements.

Contribution

It presents a novel framework that integrates MCTS with GRPO for dense, process-level rewards, enabling stable, step-wise reasoning optimization without human annotations.

Findings

01

Up to 17.0% gains on in-domain tasks

02

Up to 21.0% gains on out-of-domain tasks

03

Consistent outperformance over baselines across benchmarks

Abstract

Despite significant progress, Vision-Language Models (VLMs) still struggle with complex visual reasoning, where multi-step dependencies cause early errors to cascade through the reasoning chain. Existing post-training paradigms are limited: Supervised Fine-Tuning (SFT) relies on costly step-level annotations, while Reinforcement Learning with Verifiable Rewards (RLVR) methods like GRPO provide only sparse, outcome-level feedback, hindering stable optimization. We introduce PROPA (Process-level Reasoning Optimization with interleaved Policy Alignment), a novel framework that integrates Monte Carlo Tree Search (MCTS) with GRPO to generate dense, process-level rewards and optimize reasoning at each intermediate step without human annotations. To overcome the cold-start problem, PROPA interleaves GRPO updates with SFT, enabling the model to learn from both successful and failed reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning