Reliable Multimodal Learning Via Multi-Level Adaptive DeConfusion

Tong Zhang; Shu Shen; C. L. Philip Chen

arXiv:2502.19674·cs.CV·December 1, 2025

Reliable Multimodal Learning Via Multi-Level Adaptive DeConfusion

Tong Zhang, Shu Shen, C. L. Philip Chen

PDF

Open Access

TL;DR

This paper introduces MLAD, a novel method that reduces inter-class and sample-specific confusion in multimodal learning, significantly improving classification reliability especially in noisy data scenarios.

Contribution

MLAD is the first approach to eliminate inter-class confusion at both global and sample levels in multimodal learning, enhancing model reliability.

Findings

01

MLAD outperforms state-of-the-art methods on multiple benchmarks.

02

MLAD achieves higher classification confidence in noisy data.

03

MLAD demonstrates superior reliability in real-world scenarios.

Abstract

Multimodal learning enhances the performance of various machine learning tasks by leveraging complementary information across different modalities. However, existing methods often learn multimodal representations that retain substantial inter-class confusion, making it difficult to achieve high-confidence predictions, particularly in real-world scenarios with low-quality or noisy data. To address this challenge, we propose Multi-Level Adaptive DeConfusion (MLAD), which eliminates inter-class confusion in multimodal data at both global and sample levels, significantly enhancing the classification reliability of multimodal models. Specifically, MLAD first learns class-wise latent distributions with global-level confusion removed via dynamic-exit modality encoders that adapt to the varying discrimination difficulty of each class and a cross-class residual reconstruction mechanism.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications