MAD: Modality-Adaptive Decoding for Mitigating Cross-Modal Hallucinations in Multimodal Large Language Models

Sangyun Chung; Se Yeon Kim; Youngchae Chee; and Yong Man Ro

arXiv:2601.21181·cs.AI·January 30, 2026

MAD: Modality-Adaptive Decoding for Mitigating Cross-Modal Hallucinations in Multimodal Large Language Models

Sangyun Chung, Se Yeon Kim, Youngchae Chee, and Yong Man Ro

PDF

Open Access

TL;DR

This paper introduces MAD, a training-free, modality-adaptive decoding method that reduces cross-modal hallucinations in multimodal large language models by dynamically weighting modality-specific branches based on task relevance.

Contribution

MAD is a novel, training-free approach that leverages model self-assessment to adaptively weight modality-specific decoding, improving multimodal reasoning robustness.

Findings

01

MAD significantly reduces cross-modal hallucinations in experiments.

02

It improves performance on CMM and AVHBench benchmarks.

03

The approach enhances model focus on relevant modalities.

Abstract

Multimodal Large Language Models (MLLMs) suffer from cross-modal hallucinations, where one modality inappropriately influences generation about another, leading to fabricated output. This exposes a more fundamental deficiency in modality-interaction control. To address this, we propose Modality-Adaptive Decoding (MAD), a training-free method that adaptively weights modality-specific decoding branches based on task requirements. MAD leverages the model's inherent ability to self-assess modality relevance by querying which modalities are needed for each task. The extracted modality probabilities are then used to adaptively weight contrastive decoding branches, enabling the model to focus on relevant information while suppressing cross-modal interference. Extensive experiments on CMM and AVHBench demonstrate that MAD significantly reduces cross-modal hallucinations across multiple…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Music and Audio Processing · Generative Adversarial Networks and Image Synthesis