VASparse: Towards Efficient Visual Hallucination Mitigation via   Visual-Aware Token Sparsification

Xianwei Zhuang; Zhihong Zhu; Yuxin Xie; Liming Liang and; Yuexian Zou

arXiv:2501.06553·cs.CV·March 24, 2025

VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsification

Xianwei Zhuang, Zhihong Zhu, Yuxin Xie, Liming Liang and, Yuexian Zou

PDF

Open Access 1 Repo

TL;DR

VASparse introduces an efficient token sparsification method during decoding to significantly reduce visual hallucinations in large vision-language models without extra training, maintaining speed and improving output faithfulness.

Contribution

The paper presents a novel visual-aware token sparsification strategy that effectively mitigates hallucinations in LVLMs during decoding, without additional training or post-processing.

Findings

01

VASparse reduces visual hallucinations across multiple benchmarks.

02

It maintains competitive decoding speed compared to existing methods.

03

VASparse achieves state-of-the-art hallucination mitigation performance.

Abstract

Large Vision-Language Models (LVLMs) may produce outputs that are unfaithful to reality, also known as visual hallucinations (VH), which significantly impedes their real-world usage. To alleviate VH, various decoding strategies have been proposed to enhance visual information. However, many of these methods may require secondary decoding and rollback, which significantly reduces inference speed. In this work, we propose an efficient plug-and-play decoding algorithm via Visual-Aware Sparsification (VASparse) from the perspective of token sparsity for mitigating VH. VASparse is inspired by empirical observations: (1) the sparse activation of attention in LVLMs, and (2) visual-agnostic tokens sparsification exacerbates VH. Based on these insights, we propose a novel token sparsification strategy that balances efficiency and trustworthiness. Specifically, VASparse implements a visual-aware…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mengchuang123/vasparse-github
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection

MethodsSoftmax · Attention Is All You Need