MoD-DPO: Towards Mitigating Cross-modal Hallucinations in Omni LLMs using Modality Decoupled Preference Optimization

Ashutosh Chaubey; Jiacheng Pang; Mohammad Soleymani

arXiv:2603.03192·cs.CV·March 31, 2026

MoD-DPO: Towards Mitigating Cross-modal Hallucinations in Omni LLMs using Modality Decoupled Preference Optimization

Ashutosh Chaubey, Jiacheng Pang, Mohammad Soleymani

PDF

TL;DR

This paper introduces MoD-DPO, a framework that enhances the reliability of omni-modal large language models by reducing cross-modal hallucinations through modality-aware regularization and language-prior debiasing.

Contribution

The paper presents a novel modality-decoupled preference optimization method that improves modality grounding and reduces hallucinations in omni LLMs.

Findings

01

MoD-DPO outperforms previous methods on audiovisual hallucination benchmarks.

02

It improves perception accuracy and hallucination resistance.

03

The approach demonstrates scalable enhancement of multimodal model reliability.

Abstract

Omni-modal large language models (omni LLMs) have recently achieved strong performance across audiovisual understanding tasks, yet they remain highly susceptible to cross-modal hallucinations arising from spurious correlations and dominant language priors. In this work, we propose Modality-Decoupled Direct Preference Optimization (MoD-DPO), a simple and effective framework for improving modality grounding in omni LLMs. MoD-DPO introduces modality-aware regularization terms that explicitly enforce invariance to corruptions in irrelevant modalities and sensitivity to perturbations in relevant modalities, thereby reducing unintended cross-modal interactions. To further mitigate over-reliance on textual priors, we incorporate a language-prior debiasing penalty that discourages hallucination-prone text-only responses. Extensive experiments across multiple audiovisual hallucination benchmarks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.