Mixture of Decoding: An Attention-Inspired Adaptive Decoding Strategy to Mitigate Hallucinations in Large Vision-Language Models

Xinlong Chen; Yuanxing Zhang; Qiang Liu; Junfei Wu; Fuzheng Zhang; Tieniu Tan

arXiv:2505.17061·cs.CL·June 11, 2025

Mixture of Decoding: An Attention-Inspired Adaptive Decoding Strategy to Mitigate Hallucinations in Large Vision-Language Models

Xinlong Chen, Yuanxing Zhang, Qiang Liu, Junfei Wu, Fuzheng Zhang, Tieniu Tan

PDF

1 Repo

TL;DR

This paper introduces Mixture of Decoding (MoD), an adaptive decoding strategy for large vision-language models that reduces hallucinations by evaluating and adjusting based on attention correctness, leading to improved performance.

Contribution

The paper presents a novel adaptive decoding method, MoD, that dynamically adjusts decoding strategies based on attention correctness to mitigate hallucinations in LVLMs.

Findings

01

MoD significantly reduces hallucinations in LVLMs.

02

MoD outperforms existing decoding methods on multiple benchmarks.

03

The approach effectively distinguishes correct and incorrect attention during decoding.

Abstract

Large Vision-Language Models (LVLMs) have exhibited impressive capabilities across various visual tasks, yet they remain hindered by the persistent challenge of hallucinations. To address this critical issue, we propose Mixture of Decoding (MoD), a novel approach for hallucination mitigation that dynamically adapts decoding strategies by evaluating the correctness of the model's attention on image tokens. Specifically, MoD measures the consistency between outputs generated from the original image tokens and those derived from the model's attended image tokens, to distinguish the correctness aforementioned. If the outputs are consistent, indicating correct attention, MoD employs a complementary strategy to amplify critical information. Conversely, if the outputs are inconsistent, suggesting erroneous attention, MoD utilizes a contrastive strategy to suppress misleading information.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xlchen0205/mod
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Attention Is All You Need