SpectralKD: A Unified Framework for Interpreting and Distilling Vision   Transformers via Spectral Analysis

Huiyuan Tian; Bonan Xu; Shijian Li; Gang Pan

arXiv:2412.19055·cs.CV·January 31, 2025

SpectralKD: A Unified Framework for Interpreting and Distilling Vision Transformers via Spectral Analysis

Huiyuan Tian, Bonan Xu, Shijian Li, Gang Pan

PDF

Open Access 1 Repo

TL;DR

SpectralKD introduces a spectral analysis framework for understanding and improving knowledge distillation in Vision Transformers, achieving state-of-the-art results without additional trainable parameters.

Contribution

The paper presents a unified spectral analysis framework for ViTs and KD, revealing layer importance and spectral patterns, and proposes a simple spectral alignment method for effective knowledge distillation.

Findings

01

Layer-wise analysis shows CaiT concentrates information in first and last layers.

02

Spectral patterns are similar across different ViT architectures.

03

The spectral alignment method improves top-1 accuracy on ImageNet-1K.

Abstract

Knowledge Distillation (KD) has achieved widespread success in compressing large Vision Transformers (ViTs), but a unified theoretical framework for both ViTs and KD is still lacking. In this paper, we propose SpectralKD, a novel unified analytical framework that offers deeper insights into ViTs and optimizes KD via spectral analysis. Our model-wise analysis reveals that CaiT concentrates information in their first and last few layers, informing optimal layer selection for KD. Surprisingly, our layer-wise analysis discovers that Swin Transformer and CaiT exhibit similar spectral encoding patterns despite their architectural differences, leading to feature map alignment guideline. Building on these insights, we propose a simple yet effective spectral alignment method for KD. Benefiting from the deeper understanding by above analysis results, even such a simple strategy achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thy960112/SpectralKD
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndustrial Vision Systems and Defect Detection · CCD and CMOS Imaging Sensors · Neural Networks and Applications

MethodsAttention Is All You Need · Stochastic Depth · Byte Pair Encoding · Class Attention · Linear Layer · Absolute Position Encodings · Dropout · Softmax · Dense Connections · Residual Connection