MuteBench: Modality Unavailability Tolerance Evaluation for Incomplete Multimodal Fusion
Wugeng Zheng, Ziwen Kan, Tianlong Chen, Chen Chen, Song Wang

TL;DR
MuteBench is a comprehensive benchmark evaluating the robustness of multimodal fusion architectures to missing data in clinical AI, revealing architecture family as a key predictor of failure tolerance.
Contribution
It introduces MuteBench, the first benchmark to assess multiple fusion architectures across diverse clinical datasets and failure modes at controlled severity levels.
Findings
Architecture family strongly predicts robustness to missing data.
Channel-independent models handle modality missing well but are sensitive to within-modality missing.
Diffusion-based imputation can improve classification under within-modality missing.
Abstract
Multimodal physiological data powers clinical AI systems from intensive care units to wearable devices, but sensors routinely fail in practice. Two failure modes are common: modality missing, where an entire channel is absent, and within-modality missing, where a contiguous time segment is lost. No existing benchmark evaluates multiple fusion architectures under both failure modes at controlled severity levels across diverse clinical datasets. We present MuteBench, a benchmark covering 9 datasets from 7 clinical domains, 6 fusion architectures, and 2 missing-data modes over 125,000 samples. Through this benchmark, we find that architecture family is the strongest predictor of robustness, outweighing parameter count. Channel-independent models tolerate modality missing well but can be sensitive to within-modality missing, especially on short sequences. Curriculum modality dropout…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
