Lightweight Transformer Architectures for Edge Devices in Real-Time Applications
Hema Hariharan Samson

TL;DR
This survey reviews lightweight transformer architectures optimized for edge devices, analyzing recent techniques and benchmarks that enable real-time AI with reduced size, latency, and power consumption.
Contribution
It systematically reviews recent lightweight transformer models, deployment strategies, and optimization techniques, providing new insights into hardware utilization and energy efficiency at the edge.
Findings
Lightweight transformers achieve 75-96% of full-model accuracy.
Model size reduced by 4-10x with inference latency decreased by 3-9x.
Optimal hardware utilization occurs with models of 15-40M parameters.
Abstract
The deployment of transformer-based models on resource-constrained edge devices represents a critical challenge in enabling real-time artificial intelligence applications. This comprehensive survey examines lightweight transformer architectures specifically designed for edge deployment, analyzing recent advances in model compression, quantization, pruning, and knowledge distillation techniques. We systematically review prominent lightweight variants including MobileBERT, TinyBERT, DistilBERT, EfficientFormer, EdgeFormer, and MobileViT, providing detailed performance benchmarks on standard datasets such as GLUE, SQuAD, ImageNet-1K, and COCO. Our analysis encompasses current industry adoption patterns across major hardware platforms (NVIDIA Jetson, Qualcomm Snapdragon, Apple Neural Engine, ARM architectures), deployment frameworks (TensorFlow Lite, ONNX Runtime, PyTorch Mobile, CoreML),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Big Data and Digital Economy
