PaLMR: Towards Faithful Visual Reasoning via Multimodal Process Alignment

Yantao Li; Qiang Hui; Chenyang Yan; Kanzhi Cheng; Fang Zhao; Chao Tan; Huanling Gao; Jianbing Zhang; Kai Wang; Xinyu Dai; Shiguo Lian

arXiv:2603.06652·cs.CV·March 10, 2026

PaLMR: Towards Faithful Visual Reasoning via Multimodal Process Alignment

Yantao Li, Qiang Hui, Chenyang Yan, Kanzhi Cheng, Fang Zhao, Chao Tan, Huanling Gao, Jianbing Zhang, Kai Wang, Xinyu Dai, Shiguo Lian

PDF

Open Access

TL;DR

PaLMR introduces a framework that aligns reasoning processes with visual evidence in multimodal models, reducing hallucinations and enhancing reasoning fidelity for more reliable AI systems.

Contribution

It presents a novel process-aligned training approach with structured data and hierarchical rewards to improve visual reasoning accuracy in multimodal large language models.

Findings

01

Significantly reduces reasoning hallucinations on HallusionBench.

02

Achieves state-of-the-art results in visual reasoning tasks.

03

Maintains strong performance on multiple benchmark datasets.

Abstract

Reinforcement learning has recently improved the reasoning ability of Large Language Models and Multimodal LLMs, yet prevailing reward designs emphasise final-answer correctness and consequently tolerate process hallucinations--cases where models reach the right answer while misperceiving visual evidence. We address this process-level misalignment with PaLMR, a framework that aligns not only outcomes but also the reasoning process itself. PaLMR comprises two complementary components: a perception-aligned data layer that constructs process-aware reasoning data with structured pseudo-ground-truths and verifiable visual facts, and a process-aligned optimisation layer that constructs a hierarchical reward fusion scheme with a process-aware scoring function to encourage visually faithful chains-of-thought and improve training stability. Experiments on Qwen2.5-VL-7B show that our approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Advanced Graph Neural Networks