Contrastive Representation Distillation via Multi-Scale Feature Decoupling
Cuipeng Wang, Haipeng Wang

TL;DR
This paper introduces MSDCRD, a novel distillation framework that decouples features into multi-scale local components and uses contrastive learning for efficient, high-performance knowledge transfer without external memory.
Contribution
It proposes a model-agnostic method that decouples features into multi-scale local parts and employs contrastive losses, improving distillation efficiency and effectiveness.
Findings
Achieves superior performance in homogeneous and heterogeneous settings.
Eliminates the need for external memory buffers.
Demonstrates strong generalization across architectures.
Abstract
Knowledge distillation enhances the performance of compact student networks by transferring knowledge from more powerful teacher networks without introducing additional parameters. In the feature space, local regions within an individual global feature encode distinct yet interdependent semantic information. Previous feature-based distillation methods mainly emphasize global feature alignment while neglecting the decoupling of local regions within an individual global feature, which often results in semantic confusion and suboptimal performance. Moreover, conventional contrastive representation distillation suffers from low efficiency due to its reliance on a large memory buffer to store feature samples. To address these limitations, this work proposes MSDCRD, a model-agnostic distillation framework that systematically decouples global features into multi-scale local features and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems · Face and Expression Recognition · Neural Networks and Applications
MethodsFocus
