Quantifying Epistemic Uncertainty in Multimodal Long-Tailed Classification: A Belief Entropy-Based Evidential Fusion Framework
Guorui Zhu

TL;DR
This paper introduces a new framework to handle uncertainty and improve fairness in multimodal classification tasks with imbalanced data.
Contribution
The novel framework, UMuLT, combines evidential reasoning with deep learning to address uncertainty and class imbalance in multimodal settings.
Findings
UMuLT improves performance on tail classes in long-tailed multimodal classification tasks.
The framework outperforms existing methods in overall metrics and calibration.
Statistical significance tests validate the effectiveness of the proposed approach.
Abstract
Deep multimodal learning has excelled in tasks involving vision, language, and audio modalities. Nevertheless, their performance on tail classes exhibits significant degradation under the long-tailed distributions common in real-world data, meanwhile related fusion schemes often provide only limited treatment of modality-specific uncertainty and rarely incorporate explicit mechanisms for class-level fairness. To address these information discrepancies, we present a framework that integrates evidential reasoning with deep learning–Uncertainty-Quantified Multimodal Learning for Long-Tailed Classification (UMuLT). The framework includes: (i) an uncertainty-gated evidential fusion module that adaptively down-weights unreliable modalities; (ii) an exponential moving average (EMA) fairness regularizer that dynamically amplifies tail-class gradients; and (iii) a cross-modal consistency…
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI) · Speech Recognition and Synthesis
