Decoding by Perturbation: Mitigating MLLM Hallucinations via Dynamic Textual Perturbation
Sihang Jia, Shuliang Liu, Songbo Yang, Yibo Yan, Xin Zou, Xuming Hu

TL;DR
This paper introduces DeP, a training-free method that reduces hallucinations in multimodal large language models by applying controlled textual perturbations during decoding.
Contribution
DeP offers a novel, training-free approach that mitigates hallucinations through dynamic textual interventions and attention variance analysis during decoding.
Findings
DeP significantly reduces hallucinations across multiple benchmarks.
It improves the stability of visual grounding during decoding.
DeP outperforms existing mitigation methods in various evaluations.
Abstract
Multimodal Large Language Models frequently suffer from inference hallucinations, partially stemming from language priors dominating visual evidence. Existing training-free mitigation methods either perturb the visual representation and deviate from the natural image distribution, or enforce intrusive manipulations that compromise the model's inherent generative fluency. We introduce a novel perspective that multimodal hallucination manifests as the hypersensitivity of visual grounding to textual phrasing during the decoding phase. Building on this insight, we propose Decoding by Perturbation (DeP), a training-free framework mitigating prior-induced hallucinations via controlled textual interventions. DeP employs a dynamic probe applying multi-level textual perturbations to elicit latent language priors. Leveraging attention variance, it enhances stable evidence regions while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
