PDCR: Perception-Decomposed Confidence Reward for Vision-Language Reasoning
Hee Suk Yoon, Eunseop Yoon, Ji Woo Hong, SooHwan Eom, Gwanhyeong Koo, Mark Hasegawa-Johnson, Qi Dai, Chong Luo, Chang D. Yoo

TL;DR
This paper introduces PDCR, a reward framework that decomposes confidence signals into perception and reasoning components to improve vision-language reasoning training.
Contribution
PDCR employs unsupervised skill decomposition and intra-cluster normalization to better align rewards with heterogeneous visual and textual reasoning tasks.
Findings
PDCR outperforms naive global-reward methods on vision-language benchmarks.
Intra-cluster normalization stabilizes confidence signals for perception and reasoning.
Skill decomposition improves the effectiveness of reinforcement learning in V-L reasoning.
Abstract
Reinforcement Learning with Verifiable Rewards (RLVR) traditionally relies on a sparse, outcome-based signal. Recent work shows that providing a fine-grained, model-intrinsic signal (rewarding the confidence growth in the ground-truth answer) effectively improves language reasoning training by providing step-level guidance without costly external models. While effective for unimodal text, we find that naively applying this global reward to vision-language (V-L) reasoning is a suboptimal strategy, as the task is a heterogeneous mix of sparse visual perception and dense textual reasoning. This global normalization creates mixture-induced signal degradation, where the training signal for visual steps is statistically distorted by the predominant textual steps. We propose Perception-Decomposed Confidence Reward (PDCR), a framework that solves this by aligning the reward structure with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
