Bridging Visual Representation and Reinforcement Learning from Verifiable Rewards in Large Vision-Language Models

Yuhang Han; Yuyang Wu; Zhengbo Jiao; Yiyu Wang; Xuyang Liu; Shaobo Wang; Hanlin Xu; Xuming Hu; Linfeng Zhang

arXiv:2603.27375·cs.CV·March 31, 2026

Bridging Visual Representation and Reinforcement Learning from Verifiable Rewards in Large Vision-Language Models

Yuhang Han, Yuyang Wu, Zhengbo Jiao, Yiyu Wang, Xuyang Liu, Shaobo Wang, Hanlin Xu, Xuming Hu, Linfeng Zhang

PDF

1 Repo

TL;DR

This paper introduces KAWHI, a reward reweighting mechanism that enhances large vision-language models by explicitly integrating structured visual information into reinforcement learning, improving multimodal reasoning.

Contribution

KAWHI provides a novel, plug-and-play method for incorporating visual structure into reward optimization, boosting reasoning performance in LVLMs.

Findings

01

KAWHI consistently improves reasoning benchmarks across models.

02

It effectively localizes salient visual regions for better alignment.

03

KAWHI enhances the coupling of visual evidence with reasoning steps.

Abstract

Reinforcement Learning from Verifiable Rewards (RLVR) has substantially enhanced the reasoning capabilities of large language models in abstract reasoning tasks. However, its application to Large Vision-Language Models (LVLMs) remains constrained by a structural representational bottleneck. Existing approaches generally lack explicit modeling and effective utilization of visual information, preventing visual representations from being tightly coupled with the reinforcement learning optimization process and thereby limiting further improvements in multimodal reasoning performance. To address this limitation, we propose KAWHI (Key-Region Aligned Weighted Harmonic Incentive), a plug-and-play reward reweighting mechanism that explicitly incorporates structured visual information into uniform reward policy optimization methods (e.g., GRPO and GSPO). The method adaptively localizes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://kawhiiiileo.github.io/KAWHI_PAGE
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.