SAVER: Mitigating Hallucinations in Large Vision-Language Models via Style-Aware Visual Early Revision

Zhaoxu Li; Chenqi Kong; Yi Yu; Qiangqiang Wu; Xinghao Jiang; Ngai-Man Cheung; Bihan Wen; Alex Kot; Xudong Jiang

arXiv:2508.03177·cs.CV·August 6, 2025

SAVER: Mitigating Hallucinations in Large Vision-Language Models via Style-Aware Visual Early Revision

Zhaoxu Li, Chenqi Kong, Yi Yu, Qiangqiang Wu, Xinghao Jiang, Ngai-Man Cheung, Bihan Wen, Alex Kot, Xudong Jiang

PDF

1 Video

TL;DR

This paper introduces SAVER, a style-aware method that reduces hallucinations in vision-language models when processing stylized images, improving their reliability in critical applications.

Contribution

The paper presents a novel style-aware early revision mechanism that leverages visual attention feedback to mitigate hallucinations in LVLMs, especially with stylized images.

Findings

01

Stylized images cause more hallucinations than photographic images.

02

SAVER significantly reduces hallucinations across multiple models and datasets.

03

The method improves the reliability of LVLMs in real-world scenarios.

Abstract

Large Vision-Language Models (LVLMs) recently achieve significant breakthroughs in understanding complex visual-textual contexts. However, hallucination issues still limit their real-world applicability. Although previous mitigation methods effectively reduce hallucinations in photographic images, they largely overlook the potential risks posed by stylized images, which play crucial roles in critical scenarios such as game scene understanding, art education, and medical analysis. In this work, we first construct a dataset comprising photographic images and their corresponding stylized versions with carefully annotated caption labels. We then conduct head-to-head comparisons on both discriminative and generative tasks by benchmarking 13 advanced LVLMs on the collected datasets. Our findings reveal that stylized images tend to induce significantly more hallucinations than their…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SAVER: Mitigating Hallucinations in Large Vision-Language Models via Style-Aware Visual Early Revision· underline