Distilling Cross-Modal Knowledge via Feature Disentanglement

Junhong Liu; Yuan Zhang; Tao Huang; Wenchao Xu; Renyu Yang

arXiv:2511.19887·cs.CV·November 26, 2025

Distilling Cross-Modal Knowledge via Feature Disentanglement

Junhong Liu, Yuan Zhang, Tao Huang, Wenchao Xu, Renyu Yang

PDF

Open Access 1 Video

TL;DR

This paper introduces a frequency-domain approach to cross-modal knowledge distillation, improving the transfer of knowledge between different modalities by decoupling and balancing features based on their frequency characteristics.

Contribution

It proposes a novel frequency-decoupled distillation method that enhances cross-modal knowledge transfer by leveraging frequency-domain features and addressing distributional shifts.

Findings

01

Significantly outperforms traditional KD methods

02

Effective in reducing cross-modal representation inconsistencies

03

Demonstrates robustness across multiple benchmark datasets

Abstract

Knowledge distillation (KD) has proven highly effective for compressing large models and enhancing the performance of smaller ones. However, its effectiveness diminishes in cross-modal scenarios, such as vision-to-language distillation, where inconsistencies in representation across modalities lead to difficult knowledge transfer. To address this challenge, we propose frequency-decoupled cross-modal knowledge distillation, a method designed to decouple and balance knowledge transfer across modalities by leveraging frequency-domain features. We observed that low-frequency features exhibit high consistency across different modalities, whereas high-frequency features demonstrate extremely low cross-modal similarity. Accordingly, we apply distinct losses to these features: enforcing strong alignment in the low-frequency domain and introducing relaxed alignment for high-frequency features.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Distilling Cross-Modal Knowledge via Feature Disentanglement· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis