Looking Back and Forth: Cross-Image Attention Calibration and Attentive Preference Learning for Multi-Image Hallucination Mitigation

Xiaochen Yang; Hao Fang; Jiawei Kong; Yaoxin Mao; Bin Chen; Shu-Tao Xia

arXiv:2603.07048·cs.CV·March 10, 2026

Looking Back and Forth: Cross-Image Attention Calibration and Attentive Preference Learning for Multi-Image Hallucination Mitigation

Xiaochen Yang, Hao Fang, Jiawei Kong, Yaoxin Mao, Bin Chen, Shu-Tao Xia

PDF

Open Access

TL;DR

This paper introduces CAPL, a framework that improves multi-image hallucination mitigation in vision-language models by enhancing cross-image attention and preference learning, leading to more accurate and reliable multi-image reasoning.

Contribution

The paper proposes a novel structured approach combining cross-image attention calibration and preference learning to reduce hallucinations in multi-image tasks.

Findings

01

CAPL improves multi-image hallucination performance across various models.

02

The framework maintains or slightly enhances single-image task performance.

03

Experimental results show consistent gains on multiple benchmarks.

Abstract

Although large vision-language models (LVLMs) have demonstrated remarkable capabilities, they are prone to hallucinations in multi-image tasks. We attribute this issue to limitations in existing attention mechanisms and insufficient cross-image modeling. Inspired by this, we propose a structured hallucination mitigation framework involving Cross-Image Attention calibration and Preference Learning (CAPL). CAPL explicitly enhances inter-image interactions at the architectural level while reinforcing reliance on genuine cross-image evidence during training, thereby improving the model's perception and modeling of cross-image associations. Specifically, we (i) introduce a selectable image token interaction attention mechanism to establish fine-grained cross-image entity alignment and information flow; (ii) design a cross-image modeling-based preference optimization strategy that contrasts…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Image Processing Techniques · Generative Adversarial Networks and Image Synthesis