MINT: Mitigating Hallucinations in Large Vision-Language Models via Token Reduction
Chao Wang, Jianming Yang, Yang Zhou

TL;DR
This paper introduces MINT, a training-free decoding strategy that reduces hallucinations in large vision-language models by masking irrelevant image tokens and emphasizing key visual regions, leading to improved perception and reliability.
Contribution
MINT is a novel, training-free method that mitigates hallucinations in LVLMs by dynamically reducing attention to non-essential tokens and enhancing focus on key visual elements.
Findings
Achieves 4% reduction in hallucinations on benchmarks
Perceives 5% more visual points despite token reduction
Improves model reliability in high-stakes domains
Abstract
Hallucination has been a long-standing and inevitable problem that hinders the application of Large Vision-Language Models (LVLMs) in domains that require high reliability. Various methods focus on improvement depending on data annotations or training strategies, yet place less emphasis on LLM's inherent problems. To fill this gap, we delve into the attention mechanism of the decoding process in the LVLM. Intriguingly, our investigation uncovers the prevalent attention redundancy within the hierarchical architecture of the LVLM, manifesting as overextended image processing in deep layers and an overabundance of non-essential image tokens. Stemming from the observation, we thus propose MINT, a novel training-free decoding strategy, MItigating hallucinations via tokeN reducTion. Specifically, we dynamically intensify the LVLM's local perception capability by masking its attention to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Anomaly Detection Techniques and Applications · Brain Tumor Detection and Classification
MethodsSoftmax · Attention Is All You Need · Focus
