Accelerating Vision Transformers on Brain Processing Unit
Jinchi Tang, Yan Guo

TL;DR
This paper presents a novel method to restructure Vision Transformers for efficient deployment on Brain Processing Units, enabling significant speedups while maintaining high accuracy without retraining.
Contribution
The authors redesign Vision Transformer layers into convolutional operators to fully utilize BPU hardware, allowing deployment without retraining or fine-tuning.
Findings
Achieves 3.8x inference speedup on BPU with minimal accuracy loss.
Maintains competitive accuracy on ImageNet and flower datasets.
First successful deployment of Vision Transformers optimized for BPU hardware.
Abstract
With the advancement of deep learning technologies, specialized neural processing hardware such as Brain Processing Units (BPUs) have emerged as dedicated platforms for CNN acceleration, offering optimized INT8 computation capabilities for convolutional operations. Meanwhile, Vision Transformer (ViT) models, such as the Data-efficient Image Transformer (DeiT), have demonstrated superior performance and play increasingly crucial roles in computer vision tasks. However, due to the architectural mismatch between CNN-optimized hardware and Vision Transformer computation characteristics--namely, that linear layers in Transformers operate on three-dimensional data while BPU acceleration is designed for four-dimensional convolution operations-it is difficult or even impossible to leverage BPU's advantages when deploying Vision Transformers. To address this challenge, we propose a novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEEG and Brain-Computer Interfaces · Advanced Neural Network Applications · Advanced Memory and Neural Computing
