PDCR: Perception-Decomposed Confidence Reward for Vision-Language Reasoning

Hee Suk Yoon; Eunseop Yoon; Ji Woo Hong; SooHwan Eom; Gwanhyeong Koo; Mark Hasegawa-Johnson; Qi Dai; Chong Luo; Chang D. Yoo

arXiv:2605.13467·cs.CL·May 14, 2026

PDCR: Perception-Decomposed Confidence Reward for Vision-Language Reasoning

Hee Suk Yoon, Eunseop Yoon, Ji Woo Hong, SooHwan Eom, Gwanhyeong Koo, Mark Hasegawa-Johnson, Qi Dai, Chong Luo, Chang D. Yoo

PDF

TL;DR

This paper introduces PDCR, a reward framework that decomposes confidence signals into perception and reasoning components to improve vision-language reasoning training.

Contribution

PDCR employs unsupervised skill decomposition and intra-cluster normalization to better align rewards with heterogeneous visual and textual reasoning tasks.

Findings

01

PDCR outperforms naive global-reward methods on vision-language benchmarks.

02

Intra-cluster normalization stabilizes confidence signals for perception and reasoning.

03

Skill decomposition improves the effectiveness of reinforcement learning in V-L reasoning.

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) traditionally relies on a sparse, outcome-based signal. Recent work shows that providing a fine-grained, model-intrinsic signal (rewarding the confidence growth in the ground-truth answer) effectively improves language reasoning training by providing step-level guidance without costly external models. While effective for unimodal text, we find that naively applying this global reward to vision-language (V-L) reasoning is a suboptimal strategy, as the task is a heterogeneous mix of sparse visual perception and dense textual reasoning. This global normalization creates mixture-induced signal degradation, where the training signal for visual steps is statistically distorted by the predominant textual steps. We propose Perception-Decomposed Confidence Reward (PDCR), a framework that solves this by aligning the reward structure with the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.