Hallucination-aware intermediate representation edit in large vision-language models

Wei Suo; Hanzu Zhang; Lijun Zhang; Ji Ma; Peng Wang; Yanning Zhang

arXiv:2603.29405·cs.CV·April 1, 2026

Hallucination-aware intermediate representation edit in large vision-language models

Wei Suo, Hanzu Zhang, Lijun Zhang, Ji Ma, Peng Wang, Yanning Zhang

PDF

1 Repo 1 Video

TL;DR

This paper introduces a dynamic hallucination detection and editing framework for large vision-language models, significantly reducing hallucinations with minimal extra computation.

Contribution

It proposes a novel method for detecting and editing hallucination representations in vision-language models, improving robustness and controllability.

Findings

01

Achieves state-of-the-art hallucination mitigation performance.

02

Operates with minimal additional computational cost.

03

Demonstrates effective hallucination elimination and controllability.

Abstract

Large Vision-Language Models have demonstrated exceptional performance in multimodal reasoning and complex scene understanding. However, these models still face significant hallucination issues, where outputs contradict visual facts. Recent research on hallucination mitigation has focused on retraining methods and Contrastive Decoding (CD) methods. While both methods perform well, retraining methods require substantial training resources, and CD methods introduce dual inference overhead. These factors hinder their practical applicability. To address the above issue, we propose a framework for dynamically detecting hallucination representations and performing hallucination-eliminating edits on these representations. With minimal additional computational cost, we achieve state-of-the-art performance on existing benchmarks. Extensive experiments demonstrate the effectiveness of our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ASGO-MM/HIRE
github

Videos

Hallucination-aware Intermediate Representation Edit in Large Vision-Language Models· slideslive