MINT: Mitigating Hallucinations in Large Vision-Language Models via   Token Reduction

Chao Wang; Jianming Yang; Yang Zhou

arXiv:2502.00717·cs.CV·February 4, 2025

MINT: Mitigating Hallucinations in Large Vision-Language Models via Token Reduction

Chao Wang, Jianming Yang, Yang Zhou

PDF

Open Access

TL;DR

This paper introduces MINT, a training-free decoding strategy that reduces hallucinations in large vision-language models by masking irrelevant image tokens and emphasizing key visual regions, leading to improved perception and reliability.

Contribution

MINT is a novel, training-free method that mitigates hallucinations in LVLMs by dynamically reducing attention to non-essential tokens and enhancing focus on key visual elements.

Findings

01

Achieves 4% reduction in hallucinations on benchmarks

02

Perceives 5% more visual points despite token reduction

03

Improves model reliability in high-stakes domains

Abstract

Hallucination has been a long-standing and inevitable problem that hinders the application of Large Vision-Language Models (LVLMs) in domains that require high reliability. Various methods focus on improvement depending on data annotations or training strategies, yet place less emphasis on LLM's inherent problems. To fill this gap, we delve into the attention mechanism of the decoding process in the LVLM. Intriguingly, our investigation uncovers the prevalent attention redundancy within the hierarchical architecture of the LVLM, manifesting as overextended image processing in deep layers and an overabundance of non-essential image tokens. Stemming from the observation, we thus propose MINT, a novel training-free decoding strategy, MItigating hallucinations via tokeN reducTion. Specifically, we dynamically intensify the LVLM's local perception capability by masking its attention to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCOVID-19 diagnosis using AI · Anomaly Detection Techniques and Applications · Brain Tumor Detection and Classification

MethodsSoftmax · Attention Is All You Need · Focus