Reliable Multimodal Learning Via Multi-Level Adaptive DeConfusion
Tong Zhang, Shu Shen, C. L. Philip Chen

TL;DR
This paper introduces MLAD, a novel method that reduces inter-class and sample-specific confusion in multimodal learning, significantly improving classification reliability especially in noisy data scenarios.
Contribution
MLAD is the first approach to eliminate inter-class confusion at both global and sample levels in multimodal learning, enhancing model reliability.
Findings
MLAD outperforms state-of-the-art methods on multiple benchmarks.
MLAD achieves higher classification confidence in noisy data.
MLAD demonstrates superior reliability in real-world scenarios.
Abstract
Multimodal learning enhances the performance of various machine learning tasks by leveraging complementary information across different modalities. However, existing methods often learn multimodal representations that retain substantial inter-class confusion, making it difficult to achieve high-confidence predictions, particularly in real-world scenarios with low-quality or noisy data. To address this challenge, we propose Multi-Level Adaptive DeConfusion (MLAD), which eliminates inter-class confusion in multimodal data at both global and sample levels, significantly enhancing the classification reliability of multimodal models. Specifically, MLAD first learns class-wise latent distributions with global-level confusion removed via dynamic-exit modality encoders that adapt to the varying discrimination difficulty of each class and a cross-class residual reconstruction mechanism.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications
