MSD: Saliency-aware Knowledge Distillation for Multimodal Understanding
Woojeong Jin, Maziar Sanjabi, Shaoliang Nie, Liang Tan, Xiang Ren,, Hamed Firooz

TL;DR
This paper introduces a saliency-aware multimodal knowledge distillation framework that improves model performance by focusing on modality-specific information and saliency-based weighting in vision-language tasks.
Contribution
It proposes a novel modality-specific distillation method with saliency-based weighting, enhancing knowledge transfer in multimodal models.
Findings
MSD outperforms traditional KD on four multimodal datasets.
Saliency-based weighting improves the effectiveness of knowledge distillation.
Modality-specific analysis reveals the importance of different modalities in KD.
Abstract
To reduce a model size but retain performance, we often rely on knowledge distillation (KD) which transfers knowledge from a large "teacher" model to a smaller "student" model. However, KD on multimodal datasets such as vision-language tasks is relatively unexplored, and digesting multimodal information is challenging since different modalities present different types of information. In this paper, we perform a large-scale empirical study to investigate the importance and effects of each modality in knowledge distillation. Furthermore, we introduce a multimodal knowledge distillation framework, modality-specific distillation (MSD), to transfer knowledge from a teacher on multimodal tasks by learning the teacher's behavior within each modality. The idea aims at mimicking a teacher's modality-specific predictions by introducing auxiliary loss terms for each modality. Furthermore, because…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling
MethodsKnowledge Distillation
