Thinking with Deltas: Incentivizing Reinforcement Learning via Differential Visual Reasoning Policy

Shujian Gao; Yuan Wang; Jiangtao Yan; Zuxuan Wu; Yu-Gang Jiang

arXiv:2601.06801·cs.AI·January 13, 2026

Thinking with Deltas: Incentivizing Reinforcement Learning via Differential Visual Reasoning Policy

Shujian Gao, Yuan Wang, Jiangtao Yan, Zuxuan Wu, Yu-Gang Jiang

PDF

Open Access

TL;DR

This paper introduces a novel reinforcement learning framework called Thinking with Deltas, which incentivizes models to focus on visual evidence by maximizing reasoning divergence from masked inputs and minimizing divergence from perturbed inputs, thereby improving visual reasoning.

Contribution

The paper proposes the Differential Visual Reasoning Policy (DVRP), a new intrinsic supervision method that enhances visual understanding in multimodal models by aligning reasoning with visual information changes.

Findings

01

DVRP significantly outperforms state-of-the-art methods on various benchmarks.

02

Models trained with DVRP demonstrate increased visual sensitivity and robustness.

03

The approach improves reasoning in both general and medical domains.

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has significantly advanced reasoning capabilities in Large Language Models. However, adapting RLVR to multimodal domains suffers from a critical \textit{perception-reasoning decoupling}. Existing paradigms, driven by text-centric outcome rewards, reasoning in language medium, inadvertently encourage models to bypass visual perception. We empirically validate this through blind experiments: state-of-the-art policies maintain or surprisingly improve performance even when visual inputs are entirely removed. This reveals that these models degenerate into \textit{blind reasoners}, exploiting linguistic priors to generate plausible answers instead of attending to visual evidence. In response, we propose \textbf{Thinking with Deltas}, a framework driven by a \textbf{Differential Visual Reasoning Policy (DVRP)}. DVRP introduces intrinsic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling