Perceptual-Evidence Anchored Reinforced Learning for Multimodal Reasoning

Chi Zhang; Haibo Qiu; Qiming Zhang; Yufei Xu; Zhixiong Zeng; Siqi Yang; Peng Shi; Lin Ma; Jing Zhang

arXiv:2511.18437·cs.CV·November 25, 2025

Perceptual-Evidence Anchored Reinforced Learning for Multimodal Reasoning

Chi Zhang, Haibo Qiu, Qiming Zhang, Yufei Xu, Zhixiong Zeng, Siqi Yang, Peng Shi, Lin Ma, Jing Zhang

PDF

Open Access 2 Models

TL;DR

PEARL enhances multimodal reasoning in vision-language models by anchoring reasoning to verified visual evidence, reducing hallucinations and improving accuracy through perception checks and reinforcement learning.

Contribution

This paper introduces PEARL, a novel perception-reasoning framework that explicitly incorporates visual evidence verification into reinforcement learning for multimodal models.

Findings

01

Achieves +9.7% improvement on MathVerse benchmark.

02

Reduces visual hallucinations and reasoning errors.

03

Effectively integrates with existing RL methods like GRPO and DAPO.

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has significantly advanced the reasoning capabilities of Large Language Models (LLMs) and is now being applied to Vision-Language Models (VLMs). However, vanilla RLVR for VLMs verifies only the final textual output, critically neglecting the foundational step of visual perception. This oversight leads to visual hallucinations and reward hacking, as reasoning built upon flawed perception is inherently unreliable. To address this, we propose PEARL (Perceptual-Evidence Anchored Reinforced Learning), a dual-branch, perception-reasoning synergistic that strengthens multimodal reasoning by explicitly anchoring it to verified visual evidence. For each reasoning-oriented QA instance, PEARL first derive a perception checklist -- a set of perception-oriented sub-questions with verifiable answers that probe the model's understanding of key…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Topic Modeling