Meta-Learned Modality-Weighted Knowledge Distillation for Robust Multi-Modal Learning with Missing Data

Hu Wang; Salma Hassan; Yuyuan Liu; Congbo Ma; Yuanhong Chen; Qing Li; Jiahui Geng; Bingjie Wang; Yu Tian; Yutong Xie; Jodie Avery; Louise Hull; Ian Reid; Mohammad Yaqub; Gustavo Carneiro

arXiv:2405.07155·cs.CV·August 27, 2025

Meta-Learned Modality-Weighted Knowledge Distillation for Robust Multi-Modal Learning with Missing Data

Hu Wang, Salma Hassan, Yuyuan Liu, Congbo Ma, Yuanhong Chen, Qing Li, Jiahui Geng, Bingjie Wang, Yu Tian, Yutong Xie, Jodie Avery, Louise Hull, Ian Reid, Mohammad Yaqub, Gustavo Carneiro

PDF

Open Access 1 Repo

TL;DR

This paper introduces MetaKD, a meta-learning based method that adaptively weights modalities for knowledge distillation, enabling multi-modal models to perform well even with missing data across various tasks.

Contribution

The paper proposes MetaKD, a novel meta-learned modality-weighted knowledge distillation approach that enhances robustness of multi-modal models against missing modalities across multiple tasks.

Findings

01

MetaKD outperforms existing models on five datasets.

02

It maintains high accuracy with missing modalities.

03

The method is effective across segmentation and classification tasks.

Abstract

In multi-modal learning, some modalities are more influential than others, and their absence can have a significant impact on classification/segmentation accuracy. Addressing this challenge, we propose a novel approach called Meta-learned Modality-weighted Knowledge Distillation (MetaKD), which enables multi-modal models to maintain high accuracy even when key modalities are missing. MetaKD adaptively estimates the importance weight of each modality through a meta-learning process. These learned importance weights guide a pairwise modality-weighted knowledge distillation process, allowing high-importance modalities to transfer knowledge to lower-importance ones, resulting in robust performance despite missing inputs. Unlike previous methods in the field, which are often task-specific and require significant modifications, our approach is designed to work in multiple tasks (e.g.,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

billhhh/MCKD
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems · Multimodal Machine Learning Applications

MethodsKnowledge Distillation