CLAIM: Mitigating Multilingual Object Hallucination in Large Vision-Language Models with Cross-Lingual Attention Intervention

Zekai Ye; Qiming Li; Xiaocheng Feng; Libo Qin; Yichong Huang; Baohang Li; Kui Jiang; Yang Xiang; Zhirui Zhang; Yunfei Lu; Duyu Tang; Dandan Tu; Bing Qin

arXiv:2506.11073·cs.CL·June 16, 2025

CLAIM: Mitigating Multilingual Object Hallucination in Large Vision-Language Models with Cross-Lingual Attention Intervention

Zekai Ye, Qiming Li, Xiaocheng Feng, Libo Qin, Yichong Huang, Baohang Li, Kui Jiang, Yang Xiang, Zhirui Zhang, Yunfei Lu, Duyu Tang, Dandan Tu, Bing Qin

PDF

Open Access 2 Datasets 1 Video

TL;DR

This paper introduces CLAIM, a near training-free method that aligns cross-modal attention patterns to reduce multilingual object hallucination in large vision-language models, improving their visual perception across languages.

Contribution

CLAIM is a novel approach that intervenes in attention outputs during inference to mitigate hallucinations without extensive retraining or fine-tuning.

Findings

01

Achieves 13.56% average improvement on POPE benchmark

02

Up to 30% improvement in Spanish

03

Reduces hallucination in multilingual scenarios

Abstract

Large Vision-Language Models (LVLMs) have demonstrated impressive multimodal abilities but remain prone to multilingual object hallucination, with a higher likelihood of generating responses inconsistent with the visual input when utilizing queries in non-English languages compared to English. Most existing approaches to address these rely on pretraining or fine-tuning, which are resource-intensive. In this paper, inspired by observing the disparities in cross-modal attention patterns across languages, we propose Cross-Lingual Attention Intervention for Mitigating multilingual object hallucination (CLAIM) in LVLMs, a novel near training-free method by aligning attention patterns. CLAIM first identifies language-specific cross-modal attention heads, then estimates language shift vectors from English to the target language, and finally intervenes in the attention outputs during inference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

CLAIM: Mitigating Multilingual Object Hallucination in Large Vision-Language Models with Cross-Lingual Attention Intervention· underline

Taxonomy

TopicsCOVID-19 diagnosis using AI · Epilepsy research and treatment · Brain Tumor Detection and Classification