Single-Teacher View Augmentation: Boosting Knowledge Distillation via Angular Diversity
Seonghoon Yu, Dongjun Nam, Dina Katabi, Jeany Son

TL;DR
This paper introduces a cost-effective method to enhance knowledge distillation by creating diverse teacher views through angular diversity objectives, improving student performance without multiple teachers.
Contribution
The paper proposes a novel single-teacher multi-view augmentation technique using angular diversity objectives, reducing computational costs while boosting distillation effectiveness.
Findings
Outperforms existing knowledge augmentation methods.
Compatible with various KD frameworks, improving generalization.
Theoretically reduces ensemble loss upper bound.
Abstract
Knowledge Distillation (KD) aims to train a lightweight student model by transferring knowledge from a large, high-capacity teacher. Recent studies have shown that leveraging diverse teacher perspectives can significantly improve distillation performance; however, achieving such diversity typically requires multiple teacher networks, leading to high computational costs. In this work, we propose a novel cost-efficient knowledge augmentation method for KD that generates diverse multi-views by attaching multiple branches to a single teacher. To ensure meaningful semantic variation across multi-views, we introduce two angular diversity objectives: 1) constrained inter-angle diversify loss, which maximizes angles between augmented views while preserving proximity to the original teacher output, and 2) intra-angle diversify loss, which encourages an even distribution of views around the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
