Distillation Dynamics: Towards Understanding Feature-Based Distillation in Vision Transformers
Huiyuan Tian, Bonan Xu, Shijian Li

TL;DR
This paper analyzes why feature-based knowledge distillation fails for Vision Transformers by revealing a fundamental representational mismatch, and proposes insights for designing better ViT compression methods.
Contribution
It introduces a novel 'distillation dynamics' framework to analyze ViT feature distillation failure and identifies the core representational mismatch as the root cause.
Findings
ViTs exhibit a U-shaped information processing pattern.
High-dimensional encoding in teacher models causes transfer issues.
Naive feature mimicry harms student performance.
Abstract
While feature-based knowledge distillation has proven highly effective for compressing CNNs, these techniques unexpectedly fail when applied to Vision Transformers (ViTs), often performing worse than simple logit-based distillation. We provide the first comprehensive analysis of this phenomenon through a novel analytical framework termed as "distillation dynamics", combining frequency spectrum analysis, information entropy metrics, and activation magnitude tracking. Our investigation reveals that ViTs exhibit a distinctive U-shaped information processing pattern: initial compression followed by expansion. We identify the root cause of negative transfer in feature distillation: a fundamental representational paradigm mismatch between teacher and student models. Through frequency-domain analysis, we show that teacher models employ distributed, high-dimensional encoding strategies in later…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Advanced Neural Network Applications
