A Transformer-in-Transformer Network Utilizing Knowledge Distillation for Image Recognition
Dewan Tauhid Rahman, Yeahia Sarker, Antar Mazumder, Md. Shamim Anower

TL;DR
This paper introduces a Transformer-in-Transformer network with knowledge distillation for image recognition, improving efficiency and accuracy by capturing global and local image features and leveraging a teacher-student training paradigm.
Contribution
The novel inner-outer transformer architecture combined with knowledge distillation enhances learning efficiency and achieves state-of-the-art accuracy on multiple image classification datasets.
Findings
Achieved top-1 accuracy of 74.71% on CIFAR100
Attained 92.03% top-1 accuracy on CIFAR-10
Secured 99.56% top-1 accuracy on MNIST
Abstract
This paper presents a novel knowledge distillation neural architecture leveraging efficient transformer networks for effective image classification. Natural images display intricate arrangements encompassing numerous extraneous elements. Vision transformers utilize localized patches to compute attention. However, exclusive dependence on patch segmentation proves inadequate in sufficiently encompassing the comprehensive nature of the image. To address this issue, we have proposed an inner-outer transformer-based architecture, which gives attention to the global and local aspects of the image. Moreover, The training of transformer models poses significant challenges due to their demanding resource, time, and data requirements. To tackle this, we integrate knowledge distillation into the architecture, enabling efficient learning. Leveraging insights from a larger teacher model, our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Non-Destructive Testing Techniques · Power Transformer Diagnostics and Insulation
MethodsSoftmax · Attention Is All You Need · Knowledge Distillation
