MSD: Saliency-aware Knowledge Distillation for Multimodal Understanding

Woojeong Jin; Maziar Sanjabi; Shaoliang Nie; Liang Tan; Xiang Ren,; Hamed Firooz

arXiv:2101.01881·cs.CV·October 25, 2021

MSD: Saliency-aware Knowledge Distillation for Multimodal Understanding

Woojeong Jin, Maziar Sanjabi, Shaoliang Nie, Liang Tan, Xiang Ren,, Hamed Firooz

PDF

Open Access

TL;DR

This paper introduces a saliency-aware multimodal knowledge distillation framework that improves model performance by focusing on modality-specific information and saliency-based weighting in vision-language tasks.

Contribution

It proposes a novel modality-specific distillation method with saliency-based weighting, enhancing knowledge transfer in multimodal models.

Findings

01

MSD outperforms traditional KD on four multimodal datasets.

02

Saliency-based weighting improves the effectiveness of knowledge distillation.

03

Modality-specific analysis reveals the importance of different modalities in KD.

Abstract

To reduce a model size but retain performance, we often rely on knowledge distillation (KD) which transfers knowledge from a large "teacher" model to a smaller "student" model. However, KD on multimodal datasets such as vision-language tasks is relatively unexplored, and digesting multimodal information is challenging since different modalities present different types of information. In this paper, we perform a large-scale empirical study to investigate the importance and effects of each modality in knowledge distillation. Furthermore, we introduce a multimodal knowledge distillation framework, modality-specific distillation (MSD), to transfer knowledge from a teacher on multimodal tasks by learning the teacher's behavior within each modality. The idea aims at mimicking a teacher's modality-specific predictions by introducing auxiliary loss terms for each modality. Furthermore, because…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling

MethodsKnowledge Distillation