Thinking with Deltas: Incentivizing Reinforcement Learning via Differential Visual Reasoning Policy
Shujian Gao, Yuan Wang, Jiangtao Yan, Zuxuan Wu, Yu-Gang Jiang

TL;DR
This paper introduces a novel reinforcement learning framework called Thinking with Deltas, which incentivizes models to focus on visual evidence by maximizing reasoning divergence from masked inputs and minimizing divergence from perturbed inputs, thereby improving visual reasoning.
Contribution
The paper proposes the Differential Visual Reasoning Policy (DVRP), a new intrinsic supervision method that enhances visual understanding in multimodal models by aligning reasoning with visual information changes.
Findings
DVRP significantly outperforms state-of-the-art methods on various benchmarks.
Models trained with DVRP demonstrate increased visual sensitivity and robustness.
The approach improves reasoning in both general and medical domains.
Abstract
Reinforcement Learning with Verifiable Rewards (RLVR) has significantly advanced reasoning capabilities in Large Language Models. However, adapting RLVR to multimodal domains suffers from a critical \textit{perception-reasoning decoupling}. Existing paradigms, driven by text-centric outcome rewards, reasoning in language medium, inadvertently encourage models to bypass visual perception. We empirically validate this through blind experiments: state-of-the-art policies maintain or surprisingly improve performance even when visual inputs are entirely removed. This reveals that these models degenerate into \textit{blind reasoners}, exploiting linguistic priors to generate plausible answers instead of attending to visual evidence. In response, we propose \textbf{Thinking with Deltas}, a framework driven by a \textbf{Differential Visual Reasoning Policy (DVRP)}. DVRP introduces intrinsic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling
