Multimodal Knowledge Expansion

Zihui Xue; Sucheng Ren; Zhengqi Gao; Hang Zhao

arXiv:2103.14431·cs.CV·November 1, 2021·1 cites

Multimodal Knowledge Expansion

Zihui Xue, Sucheng Ren, Zhengqi Gao, Hang Zhao

PDF

Open Access 1 Repo

TL;DR

This paper introduces multimodal knowledge expansion (MKE), a framework that leverages unlabeled multimodal data through knowledge distillation, enabling improved performance of pre-trained unimodal networks on new multimodal tasks.

Contribution

The paper proposes a novel knowledge distillation approach where a multimodal student model denoises pseudo labels, outperforming its teacher and connecting to semi-supervised learning.

Findings

01

Multimodal student models denoise pseudo labels effectively.

02

MKE improves performance across four tasks and modalities.

03

Theoretical analysis explains the denoising mechanism.

Abstract

The popularity of multimodal sensors and the accessibility of the Internet have brought us a massive amount of unlabeled multimodal data. Since existing datasets and well-trained models are primarily unimodal, the modality gap between a unimodal network and unlabeled multimodal data poses an interesting problem: how to transfer a pre-trained unimodal network to perform the same task on unlabeled multimodal data? In this work, we propose multimodal knowledge expansion (MKE), a knowledge distillation-based framework to effectively utilize multimodal data without requiring labels. Opposite to traditional knowledge distillation, where the student is designed to be lightweight and inferior to the teacher, we observe that a multimodal student model consistently denoises pseudo labels and generalizes better than its teacher. Extensive experiments on four tasks and different modalities verify…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zihuixue/mke
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Anomaly Detection Techniques and Applications · Speech and Audio Processing