Looking Back and Forth: Cross-Image Attention Calibration and Attentive Preference Learning for Multi-Image Hallucination Mitigation
Xiaochen Yang, Hao Fang, Jiawei Kong, Yaoxin Mao, Bin Chen, Shu-Tao Xia

TL;DR
This paper introduces CAPL, a framework that improves multi-image hallucination mitigation in vision-language models by enhancing cross-image attention and preference learning, leading to more accurate and reliable multi-image reasoning.
Contribution
The paper proposes a novel structured approach combining cross-image attention calibration and preference learning to reduce hallucinations in multi-image tasks.
Findings
CAPL improves multi-image hallucination performance across various models.
The framework maintains or slightly enhances single-image task performance.
Experimental results show consistent gains on multiple benchmarks.
Abstract
Although large vision-language models (LVLMs) have demonstrated remarkable capabilities, they are prone to hallucinations in multi-image tasks. We attribute this issue to limitations in existing attention mechanisms and insufficient cross-image modeling. Inspired by this, we propose a structured hallucination mitigation framework involving Cross-Image Attention calibration and Preference Learning (CAPL). CAPL explicitly enhances inter-image interactions at the architectural level while reinforcing reliance on genuine cross-image evidence during training, thereby improving the model's perception and modeling of cross-image associations. Specifically, we (i) introduce a selectable image token interaction attention mechanism to establish fine-grained cross-image entity alignment and information flow; (ii) design a cross-image modeling-based preference optimization strategy that contrasts…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Image Processing Techniques · Generative Adversarial Networks and Image Synthesis
