Unbiased Dynamic Multimodal Fusion

Shicai Wei; Kaijie Zhang; Luyi Chen; Tao He; Guiduo Duan

arXiv:2603.19681·cs.CV·March 23, 2026

Unbiased Dynamic Multimodal Fusion

Shicai Wei, Kaijie Zhang, Luyi Chen, Tao He, Guiduo Duan

PDF

Open Access

TL;DR

This paper introduces UDML, a novel framework for unbiased dynamic multimodal fusion that accurately assesses modality quality across noise levels and corrects for modality bias, improving fusion performance.

Contribution

The paper proposes a noise-aware uncertainty estimator and a bias correction mechanism, advancing dynamic multimodal fusion by addressing limitations of empirical metrics and modality bias.

Findings

01

Improved uncertainty estimation across noise conditions

02

Effective bias correction enhances modality contribution balance

03

Validated on diverse multimodal benchmark tasks

Abstract

Traditional multimodal methods often assume static modality quality, which limits their adaptability in dynamic real-world scenarios. Thus, dynamical multimodal methods are proposed to assess modality quality and adjust their contribution accordingly. However, they typically rely on empirical metrics, failing to measure the modality quality when noise levels are extremely low or high. Moreover, existing methods usually assume that the initial contribution of each modality is the same, neglecting the intrinsic modality dependency bias. As a result, the modality hard to learn would be doubly penalized, and the performance of dynamical fusion could be inferior to that of static fusion. To address these challenges, we propose the Unbiased Dynamic Multimodal Learning (UDML) framework. Specifically, we introduce a noise-aware uncertainty estimator that adds controlled noise to the modality…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Music and Audio Processing · Obstructive Sleep Apnea Research