Improving the Efficiency of Transformers for Resource-Constrained   Devices

Hamid Tabani; Ajay Balasubramaniam; Shabbir Marzban; Elahe Arani,; Bahram Zonooz

arXiv:2106.16006·cs.LG·July 1, 2021

Improving the Efficiency of Transformers for Resource-Constrained Devices

Hamid Tabani, Ajay Balasubramaniam, Shabbir Marzban, Elahe Arani,, Bahram Zonooz

PDF

TL;DR

This paper analyzes vision transformers on resource-limited devices and proposes parameter clustering to significantly reduce memory transfer, speed up processing, and save energy with minimal accuracy loss.

Contribution

It introduces a clustering-based method to reduce memory transfer and improve efficiency of vision transformers on low-power devices.

Findings

01

Data transfer reduced by over 4x

02

Achieves up to 22% speedup

03

Saves 39% energy with minimal accuracy loss

Abstract

Transformers provide promising accuracy and have become popular and used in various domains such as natural language processing and computer vision. However, due to their massive number of model parameters, memory and computation requirements, they are not suitable for resource-constrained low-power devices. Even with high-performance and specialized devices, the memory bandwidth can become a performance-limiting bottleneck. In this paper, we present a performance analysis of state-of-the-art vision transformers on several devices. We propose to reduce the overall memory footprint and memory transfers by clustering the model parameters. We show that by using only 64 clusters to represent model parameters, it is possible to reduce the data transfer from the main memory by more than 4x, achieve up to 22% speedup and 39% energy savings on mobile devices with less than 0.1% accuracy loss.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.