Multimodal Knowledge Expansion
Zihui Xue, Sucheng Ren, Zhengqi Gao, Hang Zhao

TL;DR
This paper introduces multimodal knowledge expansion (MKE), a framework that leverages unlabeled multimodal data through knowledge distillation, enabling improved performance of pre-trained unimodal networks on new multimodal tasks.
Contribution
The paper proposes a novel knowledge distillation approach where a multimodal student model denoises pseudo labels, outperforming its teacher and connecting to semi-supervised learning.
Findings
Multimodal student models denoise pseudo labels effectively.
MKE improves performance across four tasks and modalities.
Theoretical analysis explains the denoising mechanism.
Abstract
The popularity of multimodal sensors and the accessibility of the Internet have brought us a massive amount of unlabeled multimodal data. Since existing datasets and well-trained models are primarily unimodal, the modality gap between a unimodal network and unlabeled multimodal data poses an interesting problem: how to transfer a pre-trained unimodal network to perform the same task on unlabeled multimodal data? In this work, we propose multimodal knowledge expansion (MKE), a knowledge distillation-based framework to effectively utilize multimodal data without requiring labels. Opposite to traditional knowledge distillation, where the student is designed to be lightweight and inferior to the teacher, we observe that a multimodal student model consistently denoises pseudo labels and generalizes better than its teacher. Extensive experiments on four tasks and different modalities verify…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Anomaly Detection Techniques and Applications · Speech and Audio Processing
