Omni-Modal Dissonance Benchmark: Systematically Breaking Modality Consensus to Probe Robustness and Calibrated Abstention

Zabir Al Nazi; Shubhashis Roy Dipta; Md Rizwan Parvez

arXiv:2603.27187·cs.LG·March 31, 2026

Omni-Modal Dissonance Benchmark: Systematically Breaking Modality Consensus to Probe Robustness and Calibrated Abstention

Zabir Al Nazi, Shubhashis Roy Dipta, Md Rizwan Parvez

PDF

1 Datasets

TL;DR

OMD-Bench is a systematic benchmark designed to evaluate the robustness, modality reliance, and calibrated abstention of omni-modal models by systematically corrupting modalities and analyzing model responses.

Contribution

The paper introduces OMD-Bench, a novel benchmark that isolates each modality's contribution and assesses models' confidence and abstention behavior under corruption.

Findings

01

Models over-abstain when two modalities are corrupted.

02

Models severely under-abstain when all three modalities are corrupted.

03

Chain-of-thought prompting increases confidence but amplifies overconfidence.

Abstract

Existing omni-modal benchmarks attempt to measure modality-specific contributions, but their measurements are confounded: naturally co-occurring modalities carry correlated yet unequal information, making it unclear whether results reflect true modality reliance or information asymmetry. We introduce OMD-Bench, where all modalities are initially congruent - each presenting the same anchor, an object or event independently perceivable through video, audio, and text - which we then systematically corrupt to isolate each modality's contribution. We also evaluate calibrated abstention: whether models appropriately refrain from answering when evidence is conflicting. The benchmark comprises 4,080 instances spanning 27 anchors across eight corruption conditions. Evaluating ten omni-modal models under zero-shot and chain-of-thought prompting, we find that models over-abstain when two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

zabir-nabil/OMD-Bench
dataset· 97 dl
97 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.