Training data-efficient image transformers & distillation through attention
Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa,, Alexandre Sablayrolles, Herv\'e J\'egou

TL;DR
This paper introduces a data-efficient vision transformer trained solely on ImageNet within three days, utilizing a novel attention-based distillation method that achieves competitive accuracy without external data.
Contribution
The work presents a convolution-free transformer trained on ImageNet in limited time and introduces a teacher-student distillation strategy using attention tokens for improved learning.
Findings
Achieved 83.1% top-1 accuracy on ImageNet with 86M parameters.
Introduced a token-based distillation method for transformers.
Reported competitive results with convnets on ImageNet and transfer tasks.
Abstract
Recently, neural networks purely based on attention were shown to address image understanding tasks such as image classification. However, these visual transformers are pre-trained with hundreds of millions of images using an expensive infrastructure, thereby limiting their adoption. In this work, we produce a competitive convolution-free transformer by training on Imagenet only. We train them on a single computer in less than 3 days. Our reference vision transformer (86M parameters) achieves top-1 accuracy of 83.1% (single-crop evaluation) on ImageNet with no external data. More importantly, we introduce a teacher-student strategy specific to transformers. It relies on a distillation token ensuring that the student learns from the teacher through attention. We show the interest of this token-based distillation, especially when using a convnet as a teacher. This leads us to report…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗facebook/deit-base-distilled-patch16-224model· 7.4k dl· ♡ 337.4k dl♡ 33
- 🤗facebook/deit-base-distilled-patch16-384model· 76k dl· ♡ 876k dl♡ 8
- 🤗facebook/deit-base-patch16-224model· 24k dl· ♡ 1524k dl♡ 15
- 🤗facebook/deit-base-patch16-384model· 219 dl· ♡ 3219 dl♡ 3
- 🤗facebook/deit-small-distilled-patch16-224model· 1.2k dl· ♡ 71.2k dl♡ 7
- 🤗facebook/deit-small-patch16-224model· 14k dl· ♡ 1114k dl♡ 11
- 🤗facebook/deit-tiny-distilled-patch16-224model· 548 dl· ♡ 9548 dl♡ 9
- 🤗facebook/deit-tiny-patch16-224model· 126k dl· ♡ 12126k dl♡ 12
- 🤗OWG/DeiTmodel· ♡ 1♡ 1
- 🤗kadirnar/timm_model_listmodel· ♡ 1♡ 1
Videos
Taxonomy
TopicsCurrency Recognition and Detection
Methods([FAQ-Expedia])What does nonrefundable mean on Expedia? · {{off-peak days}}what does refundable option mean on expedia? · 15 Quick Methods to Contact How Do I Talk to Someone at Spirit Airlines®: Full Phone & Chat Guide · Nine Convenient Ways to Connect with Expedia’s Customer Service Team · Eight Proven Tips to Resolve Your Travel Concerns with Expedia’s Phone Support · Twenty Six Quick Fixes for Faster Assistance from Expedia’s Phone Support-24/7 · Five Easy Tricks to Contact Expedia Support by Phone and Get Help Faster · Ten Quick Tips to Get in Touch with Expedia Customer Support via Call · 5 Guaranteed Ways to Avoid Delays and Talk to a Live Agent at Expedia for Immediate Travel Support Via Phone · 10 Accessible Ways to Reach Expedia Support via Call, Chat, or Email Support -Get Help Fast
