Multi-Modality Distillation via Learning the teacher's modality-level Gram Matrix
Peng Liu

TL;DR
This paper introduces a novel multi-modality knowledge distillation method that models the teacher's modality relationships using Gram matrices to improve student learning.
Contribution
It proposes a new paradigm for distillation by learning the teacher's modality-level Gram matrix, capturing inter-modality relationships.
Findings
Enhances transfer of modality relationship information from teacher to student.
Addresses limitations of previous methods focusing only on final outputs.
Improves student network performance by modeling modality relationships.
Abstract
In the context of multi-modality knowledge distillation research, the existing methods was mainly focus on the problem of only learning teacher final output. Thus, there are still deep differences between the teacher network and the student network. It is necessary to force the student network to learn the modality relationship information of the teacher network. To effectively exploit transfering knowledge from teachers to students, a novel modality relation distillation paradigm by modeling the relationship information among different modality are adopted, that is learning the teacher modality-level Gram Matrix.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
