Learning Efficient Vision Transformers via Fine-Grained Manifold   Distillation

Zhiwei Hao; Jianyuan Guo; Ding Jia; Kai Han; Yehui Tang; Chao Zhang,; Han Hu; Yunhe Wang

arXiv:2107.01378·cs.CV·June 3, 2022·23 cites

Learning Efficient Vision Transformers via Fine-Grained Manifold Distillation

Zhiwei Hao, Jianyuan Guo, Ding Jia, Kai Han, Yehui Tang, Chao Zhang,, Han Hu, Yunhe Wang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a fine-grained manifold distillation method for vision transformers that effectively compresses models, leading to high accuracy on ImageNet and improved transfer learning performance, suitable for edge devices.

Contribution

It proposes a novel patch-level manifold distillation technique specifically designed for vision transformers, reducing computational costs while maintaining high accuracy.

Findings

01

DeiT-Tiny model achieves 76.5% top-1 accuracy on ImageNet-1k.

02

The method outperforms previous distillation approaches by 2.0%.

03

Demonstrates superior transfer learning results on various benchmarks.

Abstract

In the past few years, transformers have achieved promising performances on various computer vision tasks. Unfortunately, the immense inference overhead of most existing vision transformers withholds their from being deployed on edge devices such as cell phones and smart watches. Knowledge distillation is a widely used paradigm for compressing cumbersome architectures via transferring information to a compact student. However, most of them are designed for convolutional neural networks (CNNs), which do not fully investigate the character of vision transformer (ViT). In this paper, we utilize the patch-level information and propose a fine-grained manifold distillation method. Specifically, we train a tiny student model to match a pre-trained teacher model in the patch-level manifold space. Then, we decouple the manifold matching loss into three terms with careful design to further reduce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Hao840/manifold-distillation
pytorch

Videos

Learning Efficient Vision Transformers via Fine-Grained Manifold Distillation· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Machine Learning and ELM

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Residual Connection · Softmax · Dense Connections · Multi-Head Attention · Vision Transformer · Knowledge Distillation