HalluRNN: Mitigating Hallucinations via Recurrent Cross-Layer Reasoning in Large Vision-Language Models
Le Yu, Kaishen Wang, Jianlong Xiong, Yue Cao, Lei Zhang, Zhang Yi Tao He

TL;DR
HalluRNN introduces a recurrent cross-layer reasoning architecture with a novel DG-DPU module to reduce hallucinations in large vision-language models, improving stability and consistency without extensive fine-tuning.
Contribution
The paper presents HalluRNN, a new architecture with a shared DG-DPU module that refines hidden states recurrently, addressing hallucinations at the architecture level rather than through data or decoding strategies.
Findings
Reduces hallucinations in LVLMs effectively.
Achieves robust performance across multiple benchmarks.
Requires only fine-tuning of the DG-DPU module.
Abstract
Though Large Vision-Language Models (LVLMs) have achieved remarkable performance across various tasks, they are still prone to hallucinations-generating outputs that are textually plausible but visually ungrounded. While prior approaches generally address this issue through data-centric fine-tuning or innovative decoding strategies, these methods often require substantial resources or task-specific configurations. In this work, we introduce an architecture-level solution, HalluRNN, which enhances model stability through recurrent cross-layer reasoning. Specifically, we propose a novel Dual-Gated Depth Propagation Unit (DG-DPU) module, which is shared across layers and recurrently refines hidden states. This allows for the adaptive propagation of information throughout the model, enforces consistency across layers, and mitigates hallucinations caused by representational drift. By…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · COVID-19 diagnosis using AI
