Beyond Where to Look: Trajectory-Guided Reinforcement Learning for Multimodal RLVR
Jinda Lu, Junkang Wu, Jinghan Li, Kexin Huang, Shuo Yang, Mingzhu Chen, Jiancan Wu, Kuien Liu, Xiang Wang

TL;DR
This paper introduces Trajectory-Guided Reinforcement Learning (TGRL), a method that enhances multimodal reasoning in large language models by integrating visual evidence into reasoning chains through expert trajectories.
Contribution
The paper proposes TGRL, a novel reinforcement learning approach that uses expert trajectories to improve visual grounding and reasoning in multimodal models.
Findings
TGRL significantly improves reasoning accuracy on multiple benchmarks.
Token-level reweighting and trajectory filtering stabilize policy training.
TGRL effectively bridges visual perception and logical reasoning.
Abstract
Recent advances in Reinforcement Learning with Verifiable Rewards (RLVR) for multimodal large language models (MLLMs) have mainly focused on improving final answer correctness and strengthening visual grounding. However, a critical bottleneck remains: although models can attend to relevant visual regions, they often fail to effectively incorporate visual evidence into subsequent reasoning, leading to reasoning chains that are weakly grounded in visual facts. To address this issue, we propose Trajectory-Guided Reinforcement Learning (TGRL), which guides the policy model to integrate visual evidence into fine-grained reasoning processes using expert reasoning trajectories from stronger models. We further introduce token-level reweighting and trajectory filtering to ensure stable and effective policy optimization. Extensive experiments on multiple multimodal reasoning benchmarks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
