Distilling Cross-Modal Knowledge via Feature Disentanglement
Junhong Liu, Yuan Zhang, Tao Huang, Wenchao Xu, Renyu Yang

TL;DR
This paper introduces a frequency-domain approach to cross-modal knowledge distillation, improving the transfer of knowledge between different modalities by decoupling and balancing features based on their frequency characteristics.
Contribution
It proposes a novel frequency-decoupled distillation method that enhances cross-modal knowledge transfer by leveraging frequency-domain features and addressing distributional shifts.
Findings
Significantly outperforms traditional KD methods
Effective in reducing cross-modal representation inconsistencies
Demonstrates robustness across multiple benchmark datasets
Abstract
Knowledge distillation (KD) has proven highly effective for compressing large models and enhancing the performance of smaller ones. However, its effectiveness diminishes in cross-modal scenarios, such as vision-to-language distillation, where inconsistencies in representation across modalities lead to difficult knowledge transfer. To address this challenge, we propose frequency-decoupled cross-modal knowledge distillation, a method designed to decouple and balance knowledge transfer across modalities by leveraging frequency-domain features. We observed that low-frequency features exhibit high consistency across different modalities, whereas high-frequency features demonstrate extremely low cross-modal similarity. Accordingly, we apply distinct losses to these features: enforcing strong alignment in the low-frequency domain and introducing relaxed alignment for high-frequency features.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis
