TReX- Reusing Vision Transformer's Attention for Efficient Xbar-based   Computing

Abhishek Moitra; Abhiroop Bhattacharjee; Youngeun Kim and; Priyadarshini Panda

arXiv:2408.12742·cs.AI·August 26, 2024

TReX- Reusing Vision Transformer's Attention for Efficient Xbar-based Computing

Abhishek Moitra, Abhiroop Bhattacharjee, Youngeun Kim and, Priyadarshini Panda

PDF

Open Access

TL;DR

TReX is a framework that reuses attention in Vision Transformers to optimize accuracy, energy, delay, and area, significantly improving efficiency while maintaining near-accuracy levels.

Contribution

It introduces an attention reuse method for ViTs that balances accuracy and efficiency, addressing prior neglect of attention block overheads in IMC implementations.

Findings

01

Achieves 2.3x EDAP reduction on Imagenet-1k

02

Improves TOPS/mm2 by 1.86x with minimal accuracy loss

03

Outperforms state-of-the-art token pruning and weight sharing methods

Abstract

Due to the high computation overhead of Vision Transformers (ViTs), In-memory Computing architectures are being researched towards energy-efficient deployment in edge-computing scenarios. Prior works have proposed efficient algorithm-hardware co-design and IMC-architectural improvements to improve the energy-efficiency of IMC-implemented ViTs. However, all prior works have neglected the overhead and co-depencence of attention blocks on the accuracy-energy-delay-area of IMC-implemented ViTs. To this end, we propose TReX- an attention-reuse-driven ViT optimization framework that effectively performs attention reuse in ViT models to achieve optimal accuracy-energy-delay-area tradeoffs. TReX optimally chooses the transformer encoders for attention reuse to achieve near iso-accuracy performance while meeting the user-specified delay requirement. Based on our analysis on the Imagenet-1k…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCCD and CMOS Imaging Sensors

MethodsSoftmax · Attention Is All You Need · Pruning