DeViT: Decomposing Vision Transformers for Collaborative Inference in   Edge Devices

Guanyu Xu; Zhiwei Hao; Yong Luo; Han Hu; Jianping An; Shiwen Mao

arXiv:2309.05015·cs.CV·September 12, 2023·1 cites

DeViT: Decomposing Vision Transformers for Collaborative Inference in Edge Devices

Guanyu Xu, Zhiwei Hao, Yong Luo, Han Hu, Jianping An, Shiwen Mao

PDF

Open Access

TL;DR

DeViT introduces a method to decompose large vision transformers into smaller models for collaborative, energy-efficient inference on edge devices, maintaining accuracy while significantly improving speed and reducing resource consumption.

Contribution

The paper proposes a novel framework and algorithm for decomposing ViTs into smaller models, enabling real-time, energy-efficient collaborative inference on resource-limited edge devices.

Findings

01

Achieves 2.89× latency reduction with minimal accuracy loss on CIFAR-100.

02

Surpasses recent efficient ViT models in accuracy and speed on ImageNet-1K.

03

Reduces energy consumption by over 55% on edge devices.

Abstract

Recent years have witnessed the great success of vision transformer (ViT), which has achieved state-of-the-art performance on multiple computer vision benchmarks. However, ViT models suffer from vast amounts of parameters and high computation cost, leading to difficult deployment on resource-constrained edge devices. Existing solutions mostly compress ViT models to a compact model but still cannot achieve real-time inference. To tackle this issue, we propose to explore the divisibility of transformer structure, and decompose the large ViT into multiple small models for collaborative inference at edge devices. Our objective is to achieve fast and energy-efficient collaborative inference while maintaining comparable accuracy compared with large ViTs. To this end, we first propose a collaborative inference framework termed DeViT to facilitate edge deployment by decomposing large ViTs.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Visual Attention and Saliency Detection

MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Layer Normalization · Linear Layer · Dense Connections · Residual Connection · Vision Transformer