G$^{2}$D: Boosting Multimodal Learning with Gradient-Guided Distillation
Mohammed Rakib, Arunkumar Bagavathi

TL;DR
G$^{2}$D introduces a gradient-guided distillation framework with dynamic modality prioritization to improve multimodal learning by balancing contributions from all modalities, especially weaker ones.
Contribution
This paper proposes G$^{2}$D, a novel knowledge distillation method with dynamic modality prioritization to address modality imbalance in multimodal learning.
Findings
Outperforms state-of-the-art methods in classification tasks.
Enhances the contribution of weak modalities during training.
Validated on multiple real-world datasets.
Abstract
Multimodal learning aims to leverage information from diverse data modalities to achieve more comprehensive performance. However, conventional multimodal models often suffer from modality imbalance, where one or a few modalities dominate model optimization, leading to suboptimal feature representation and underutilization of weak modalities. To address this challenge, we introduce Gradient-Guided Distillation (GD), a knowledge distillation framework that optimizes the multimodal model with a custom-built loss function that fuses both unimodal and multimodal objectives. GD further incorporates a dynamic sequential modality prioritization (SMP) technique in the learning process to ensure each modality leads the learning process, avoiding the pitfall of stronger modalities overshadowing weaker ones. We validate GD on multiple real-world datasets and show that GD…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications
MethodsKnowledge Distillation
