Credit Where It is Due: Cross-Modality Connectivity Drives Precise Reinforcement Learning for MLLM Reasoning

Zhengbo Jiao; Shaobo Wang; Zifan Zhang; Wei Wang; Bing Zhao; Hu Wei; Linfeng Zhang

arXiv:2602.11455·cs.AI·February 13, 2026

Credit Where It is Due: Cross-Modality Connectivity Drives Precise Reinforcement Learning for MLLM Reasoning

Zhengbo Jiao, Shaobo Wang, Zifan Zhang, Wei Wang, Bing Zhao, Hu Wei, Linfeng Zhang

PDF

Open Access

TL;DR

This paper investigates how visual information is integrated during reasoning in multimodal large language models, revealing that a small subset of tokens with strong visual-textual coupling serve as anchors, and introduces a new reinforcement learning framework to enhance this grounding.

Contribution

It uncovers the role of high-connectivity tokens as anchors in multimodal reasoning and proposes Anchor-Token Reinforcement Learning (AT-RL) to improve visual grounding with minimal overhead.

Findings

01

High-connectivity tokens are key anchors in reasoning.

02

AT-RL improves model performance on multiple tasks.

03

Focusing on low-connectivity tokens degrades reasoning quality.

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has significantly advanced the reasoning capabilities of Multimodal Large Language Models (MLLMs), yet how visual evidence is integrated during reasoning remains poorly understood. We explore multimodal RLVR through the lens of cross-modal attention connectivity and find that only a small fraction of tokens (approximately 15%) exhibit strong visual-textual coupling. These high-connectivity tokens act as anchors that ground reasoning in the image, while the majority follow linguistic patterns. During RLVR training, credit assignment naturally concentrates on these anchors, sharpening their visual grounding over time. Building on this insight, we propose Anchor-Token Reinforcement Learning (AT-RL), a lightweight framework that selectively reinforces high-connectivity tokens via graph-based clustering of attention topology. Evaluated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Advanced Graph Neural Networks