Escaping the Big Data Paradigm with Compact Transformers
Ali Hassani, Steven Walton, Nikhil Shah, Abulikemu Abuduweili, Jiachen, Li, Humphrey Shi

TL;DR
This paper introduces Compact Transformers, demonstrating their effectiveness on small datasets by avoiding overfitting, outperforming CNNs, and achieving state-of-the-art results with significantly fewer parameters, making transformers more accessible for resource-limited scenarios.
Contribution
The paper presents a novel compact transformer architecture optimized for small datasets, showing it can outperform CNNs and previous transformers in data efficiency and accuracy.
Findings
Achieves 98% accuracy on CIFAR-10 with only 3.7M parameters.
Sets new state-of-the-art on Flowers-102 with 99.76% accuracy.
Outperforms many CNN and NAS-based models with fewer parameters.
Abstract
With the rise of Transformers as the standard for language processing, and their advancements in computer vision, there has been a corresponding growth in parameter size and amounts of training data. Many have come to believe that because of this, transformers are not suitable for small sets of data. This trend leads to concerns such as: limited availability of data in certain scientific domains and the exclusion of those with limited resource from research in the field. In this paper, we aim to present an approach for small-scale learning by introducing Compact Transformers. We show for the first time that with the right size, convolutional tokenization, transformers can avoid overfitting and outperform state-of-the-art CNNs on small datasets. Our models are flexible in terms of model size, and can have as little as 0.28M parameters while achieving competitive results. Our best model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Topic Modeling · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Compact Convolutional Transformers · Convolution · Layer Normalization · Dropout · Label Smoothing · Transformer
