TFormer: A Transmission-Friendly ViT Model for IoT Devices
Zhichao Lu, Chuntao Ding, Felix Juefei-Xu, Vishnu Naresh Boddeti,, Shangguang Wang, and Yun Yang

TL;DR
TFormer is a new vision transformer model designed for resource-limited IoT devices, combining hybrid layers and PCS-FFN to reduce parameters and FLOPs while maintaining high accuracy across multiple vision tasks.
Contribution
The paper introduces TFormer, a transmission-friendly ViT model with novel hybrid layers and PCS-FFN, enabling efficient deployment on IoT devices with high performance.
Findings
Outperforms state-of-the-art models on ImageNet-1K, MS COCO, and ADE20K datasets.
TFormer-S achieves 5% higher accuracy than ResNet18 with fewer parameters and FLOPs.
Demonstrates effectiveness across image classification, object detection, and semantic segmentation.
Abstract
Deploying high-performance vision transformer (ViT) models on ubiquitous Internet of Things (IoT) devices to provide high-quality vision services will revolutionize the way we live, work, and interact with the world. Due to the contradiction between the limited resources of IoT devices and resource-intensive ViT models, the use of cloud servers to assist ViT model training has become mainstream. However, due to the larger number of parameters and floating-point operations (FLOPs) of the existing ViT models, the model parameters transmitted by cloud servers are large and difficult to run on resource-constrained IoT devices. To this end, this paper proposes a transmission-friendly ViT model, TFormer, for deployment on resource-constrained IoT devices with the assistance of a cloud server. The high performance and small number of model parameters and FLOPs of TFormer are attributed to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIoT and Edge/Fog Computing · Brain Tumor Detection and Classification · Advanced Neural Network Applications
MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Residual Connection · Layer Normalization · Linear Layer · Dense Connections · Vision Transformer · Convolution
