Credit Where It is Due: Cross-Modality Connectivity Drives Precise Reinforcement Learning for MLLM Reasoning
Zhengbo Jiao, Shaobo Wang, Zifan Zhang, Wei Wang, Bing Zhao, Hu Wei, Linfeng Zhang

TL;DR
This paper investigates how visual information is integrated during reasoning in multimodal large language models, revealing that a small subset of tokens with strong visual-textual coupling serve as anchors, and introduces a new reinforcement learning framework to enhance this grounding.
Contribution
It uncovers the role of high-connectivity tokens as anchors in multimodal reasoning and proposes Anchor-Token Reinforcement Learning (AT-RL) to improve visual grounding with minimal overhead.
Findings
High-connectivity tokens are key anchors in reasoning.
AT-RL improves model performance on multiple tasks.
Focusing on low-connectivity tokens degrades reasoning quality.
Abstract
Reinforcement Learning with Verifiable Rewards (RLVR) has significantly advanced the reasoning capabilities of Multimodal Large Language Models (MLLMs), yet how visual evidence is integrated during reasoning remains poorly understood. We explore multimodal RLVR through the lens of cross-modal attention connectivity and find that only a small fraction of tokens (approximately 15%) exhibit strong visual-textual coupling. These high-connectivity tokens act as anchors that ground reasoning in the image, while the majority follow linguistic patterns. During RLVR training, credit assignment naturally concentrates on these anchors, sharpening their visual grounding over time. Building on this insight, we propose Anchor-Token Reinforcement Learning (AT-RL), a lightweight framework that selectively reinforces high-connectivity tokens via graph-based clustering of attention topology. Evaluated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Advanced Graph Neural Networks
