Cross-Modal Distillation For Widely Differing Modalities

Cairong Zhao; Yufeng Jin; Zifan Song; Haonan Chen; Duoqian Miao; Guosheng Hu

arXiv:2507.16296·cs.AI·October 7, 2025

Cross-Modal Distillation For Widely Differing Modalities

Cairong Zhao, Yufeng Jin, Zifan Song, Haonan Chen, Duoqian Miao, Guosheng Hu

PDF

Open Access

TL;DR

This paper introduces a cross-modal distillation framework that enables knowledge transfer between vastly different modalities like images, text, and speech, addressing domain gaps and overfitting issues in multi-modal learning.

Contribution

It proposes two soft constrained distillation strategies and a quality-based adaptive weighting module to improve cross-modal knowledge transfer and robustness.

Findings

01

Effective knowledge transfer across image, text, and speech modalities.

02

Reduces overfitting in cross-modal distillation.

03

Improves performance in speaker recognition and image classification.

Abstract

Deep learning achieved great progress recently, however, it is not easy or efficient to further improve its performance by increasing the size of the model. Multi-modal learning can mitigate this challenge by introducing richer and more discriminative information as input. To solve the problem of limited access to multi-modal data at the time of use, we conduct multi-modal learning by introducing a teacher model to transfer discriminative knowledge to a student model during training. However, this knowledge transfer via distillation is not trivial because the big domain gap between the widely differing modalities can easily lead to overfitting. In this work, we introduce a cross-modal distillation framework. Specifically, we find hard constrained loss, e.g. l2 loss forcing the student being exact the same as the teacher, can easily lead to overfitting in cross-modality distillation. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsProcess Optimization and Integration · Advanced Control Systems Optimization