TinyFormer: Efficient Transformer Design and Deployment on Tiny Devices
Jianlei Yang, Jiacheng Liao, Fanding Lei, Meichen Liu, Lingkun Long, Junyi Chen, Han Wan, Bei Yu, Weisheng Zhao

TL;DR
TinyFormer is a novel framework that enables the efficient design and deployment of sparse transformer models on tiny devices like MCUs, achieving high accuracy and significant speedups in resource-constrained environments.
Contribution
The paper introduces TinyFormer, a comprehensive framework combining supernet search, sparse model evaluation, and deployment, specifically tailored for resource-limited tiny devices.
Findings
Achieves 96.1% accuracy on CIFAR-10 within 1MB storage and 320KB memory constraints.
Provides up to 12.2x speedup in sparse inference compared to CMSIS-NN.
First deployment framework capable of running sparse transformers on MCUs.
Abstract
Developing deep learning models on tiny devices (e.g. Microcontroller units, MCUs) has attracted much attention in various embedded IoT applications. However, it is challenging to efficiently design and deploy recent advanced models (e.g. transformers) on tiny devices due to their severe hardware resource constraints. In this work, we propose TinyFormer, a framework specifically designed to develop and deploy resource-efficient transformer models on MCUs. TinyFormer consists of SuperNAS, SparseNAS, and SparseEngine. Separately, SuperNAS aims to search for an appropriate supernet from a vast search space. SparseNAS evaluates the best sparse single-path transformer model from the identified supernet. Finally, SparseEngine efficiently deploys the searched sparse models onto MCUs. To the best of our knowledge, SparseEngine is the first deployment framework capable of performing inference of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Advanced Data and IoT Technologies
