Omni-Modal Dissonance Benchmark: Systematically Breaking Modality Consensus to Probe Robustness and Calibrated Abstention
Zabir Al Nazi, Shubhashis Roy Dipta, Md Rizwan Parvez

TL;DR
OMD-Bench is a systematic benchmark designed to evaluate the robustness, modality reliance, and calibrated abstention of omni-modal models by systematically corrupting modalities and analyzing model responses.
Contribution
The paper introduces OMD-Bench, a novel benchmark that isolates each modality's contribution and assesses models' confidence and abstention behavior under corruption.
Findings
Models over-abstain when two modalities are corrupted.
Models severely under-abstain when all three modalities are corrupted.
Chain-of-thought prompting increases confidence but amplifies overconfidence.
Abstract
Existing omni-modal benchmarks attempt to measure modality-specific contributions, but their measurements are confounded: naturally co-occurring modalities carry correlated yet unequal information, making it unclear whether results reflect true modality reliance or information asymmetry. We introduce OMD-Bench, where all modalities are initially congruent - each presenting the same anchor, an object or event independently perceivable through video, audio, and text - which we then systematically corrupt to isolate each modality's contribution. We also evaluate calibrated abstention: whether models appropriately refrain from answering when evidence is conflicting. The benchmark comprises 4,080 instances spanning 27 anchors across eight corruption conditions. Evaluating ten omni-modal models under zero-shot and chain-of-thought prompting, we find that models over-abstain when two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
