TransKD: Transformer Knowledge Distillation for Efficient Semantic   Segmentation

Ruiping Liu; Kailun Yang; Alina Roitberg; Jiaming Zhang; Kunyu Peng,; Huayao Liu; Yaonan Wang; Rainer Stiefelhagen

arXiv:2202.13393·cs.CV·September 6, 2024·27 cites

TransKD: Transformer Knowledge Distillation for Efficient Semantic Segmentation

Ruiping Liu, Kailun Yang, Alina Roitberg, Jiaming Zhang, Kunyu Peng,, Huayao Liu, Yaonan Wang, Rainer Stiefelhagen

PDF

Open Access 2 Repos

TL;DR

TransKD introduces a transformer knowledge distillation framework that significantly reduces computational costs and training time for semantic segmentation by distilling feature maps and patch embeddings from large teacher transformers to compact student models.

Contribution

The paper presents a novel transformer-based knowledge distillation framework that bypasses extensive pre-training and reduces FLOPs by over 85%, improving efficiency in semantic segmentation.

Findings

01

TransKD outperforms existing distillation methods on multiple datasets.

02

It reduces training time and computational costs significantly.

03

Achieves competitive or superior segmentation accuracy with smaller models.

Abstract

Semantic segmentation benchmarks in the realm of autonomous driving are dominated by large pre-trained transformers, yet their widespread adoption is impeded by substantial computational costs and prolonged training durations. To lift this constraint, we look at efficient semantic segmentation from a perspective of comprehensive knowledge distillation and aim to bridge the gap between multi-source knowledge extractions and transformer-specific patch embeddings. We put forward the Transformer-based Knowledge Distillation (TransKD) framework which learns compact student transformers by distilling both feature maps and patch embeddings of large teacher transformers, bypassing the long pre-training process and reducing the FLOPs by >85.0%. Specifically, we propose two fundamental modules to realize feature map distillation and patch embedding distillation, respectively: (1) Cross Selective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Autonomous Vehicle Technology and Safety

Methods*Communicated@Fast*How Do I Communicate to Expedia? · guidence~How to file a complaint against Expedia? · Dilated Convolution · Softmax · Selective Kernel Convolution · Batch Normalization · 1x1 Convolution · Selective Kernel · Knowledge Distillation