TFormer: A Transmission-Friendly ViT Model for IoT Devices

Zhichao Lu; Chuntao Ding; Felix Juefei-Xu; Vishnu Naresh Boddeti,; Shangguang Wang; and Yun Yang

arXiv:2302.07734·cs.CV·February 16, 2023

TFormer: A Transmission-Friendly ViT Model for IoT Devices

Zhichao Lu, Chuntao Ding, Felix Juefei-Xu, Vishnu Naresh Boddeti,, Shangguang Wang, and Yun Yang

PDF

Open Access

TL;DR

TFormer is a new vision transformer model designed for resource-limited IoT devices, combining hybrid layers and PCS-FFN to reduce parameters and FLOPs while maintaining high accuracy across multiple vision tasks.

Contribution

The paper introduces TFormer, a transmission-friendly ViT model with novel hybrid layers and PCS-FFN, enabling efficient deployment on IoT devices with high performance.

Findings

01

Outperforms state-of-the-art models on ImageNet-1K, MS COCO, and ADE20K datasets.

02

TFormer-S achieves 5% higher accuracy than ResNet18 with fewer parameters and FLOPs.

03

Demonstrates effectiveness across image classification, object detection, and semantic segmentation.

Abstract

Deploying high-performance vision transformer (ViT) models on ubiquitous Internet of Things (IoT) devices to provide high-quality vision services will revolutionize the way we live, work, and interact with the world. Due to the contradiction between the limited resources of IoT devices and resource-intensive ViT models, the use of cloud servers to assist ViT model training has become mainstream. However, due to the larger number of parameters and floating-point operations (FLOPs) of the existing ViT models, the model parameters transmitted by cloud servers are large and difficult to run on resource-constrained IoT devices. To this end, this paper proposes a transmission-friendly ViT model, TFormer, for deployment on resource-constrained IoT devices with the assistance of a cloud server. The high performance and small number of model parameters and FLOPs of TFormer are attributed to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIoT and Edge/Fog Computing · Brain Tumor Detection and Classification · Advanced Neural Network Applications

MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Residual Connection · Layer Normalization · Linear Layer · Dense Connections · Vision Transformer · Convolution