TransKD: Transformer Knowledge Distillation for Efficient Semantic Segmentation
Ruiping Liu, Kailun Yang, Alina Roitberg, Jiaming Zhang, Kunyu Peng,, Huayao Liu, Yaonan Wang, Rainer Stiefelhagen

TL;DR
TransKD introduces a transformer knowledge distillation framework that significantly reduces computational costs and training time for semantic segmentation by distilling feature maps and patch embeddings from large teacher transformers to compact student models.
Contribution
The paper presents a novel transformer-based knowledge distillation framework that bypasses extensive pre-training and reduces FLOPs by over 85%, improving efficiency in semantic segmentation.
Findings
TransKD outperforms existing distillation methods on multiple datasets.
It reduces training time and computational costs significantly.
Achieves competitive or superior segmentation accuracy with smaller models.
Abstract
Semantic segmentation benchmarks in the realm of autonomous driving are dominated by large pre-trained transformers, yet their widespread adoption is impeded by substantial computational costs and prolonged training durations. To lift this constraint, we look at efficient semantic segmentation from a perspective of comprehensive knowledge distillation and aim to bridge the gap between multi-source knowledge extractions and transformer-specific patch embeddings. We put forward the Transformer-based Knowledge Distillation (TransKD) framework which learns compact student transformers by distilling both feature maps and patch embeddings of large teacher transformers, bypassing the long pre-training process and reducing the FLOPs by >85.0%. Specifically, we propose two fundamental modules to realize feature map distillation and patch embedding distillation, respectively: (1) Cross Selective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Autonomous Vehicle Technology and Safety
Methods*Communicated@Fast*How Do I Communicate to Expedia? · guidence~How to file a complaint against Expedia? · Dilated Convolution · Softmax · Selective Kernel Convolution · Batch Normalization · 1x1 Convolution · Selective Kernel · Knowledge Distillation
