Not All Tokens See Equally: Perception-Grounded Policy Optimization for Large Vision-Language Models

Zekai Ye; Qiming Li; Xiaocheng Feng; Ruihan Chen; Ziming Li; Haoyu Ren; Kun Chen; Dandan Tu; Bing Qin

arXiv:2604.01840·cs.AI·April 9, 2026

Not All Tokens See Equally: Perception-Grounded Policy Optimization for Large Vision-Language Models

Zekai Ye, Qiming Li, Xiaocheng Feng, Ruihan Chen, Ziming Li, Haoyu Ren, Kun Chen, Dandan Tu, Bing Qin

PDF

1 Repo

TL;DR

This paper introduces PGPO, a novel fine-grained policy optimization method that enhances multimodal reasoning in large vision-language models by emphasizing visually-grounded tokens, leading to significant performance improvements.

Contribution

The paper proposes PGPO, a new token-level advantage reshaping framework that improves learning signals for visually-dependent tokens in large vision-language models.

Findings

01

PGPO boosts model performance by 18.7% on average across benchmarks.

02

It reduces gradient variance and prevents training collapse.

03

PGPO acts as an effective regularizer for perception-grounded reasoning.

Abstract

While Reinforcement Learning from Verifiable Rewards (RLVR) has advanced reasoning in Large Vision-Language Models (LVLMs), prevailing frameworks suffer from a foundational methodological flaw: by distributing identical advantages across all generated tokens, these methods inherently dilute the learning signals essential for optimizing the critical, visually-grounded steps of multimodal reasoning. To bridge this gap, we formulate \textit{Token Visual Dependency}, quantifying the causal information gain of visual inputs via the Kullback-Leibler (KL) divergence between visual-conditioned and text-only predictive distributions. Revealing that this dependency is highly sparse and semantically pivotal, we introduce Perception-Grounded Policy Optimization (PGPO), which is a novel fine-grained credit assignment framework that dynamically reshapes advantages at the token level. Through a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Yzk1114/PGPO
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.