LAPA: Log-Domain Prediction-Driven Dynamic Sparsity Accelerator for Transformer Model
Huizheng Wang, Hongbin Wang, Shaojun Wei, Yang Hu, Shouyi Yin

TL;DR
LAPA introduces a novel log-domain prediction architecture for dynamic sparsity in Transformer models, significantly improving energy efficiency across multiple stages with specialized algorithms and hardware design.
Contribution
It presents a cross-stage sparse acceleration strategy with a new log-domain prediction algorithm-architecture co-design for Transformer models.
Findings
LAPA achieves up to 3.52x higher energy efficiency than SOTA methods.
The proposed algorithms reduce computational overhead in dynamic sparsity prediction.
Hardware implementation demonstrates practical efficiency improvements.
Abstract
Attention-based Transformers have revolutionized natural language processing (NLP) and shown strong performance in computer vision (CV) tasks. However, as the input sequence varies, the computational bottlenecks in Transformer models exhibit dynamic behavior across stages, which calls for a cross-stage sparse acceleration strategy. Unfortunately, most existing sparse Transformer approaches are single-stage based, and their sparsity prediction mechanisms lead to significant power overhead when applied across multiple stages. To this end, this paper proposes a log-domain attention prediction algorithm-architecture co-design, named LAPA. First, an asymmetric leading one computing (ALOC) scheme is designed to eliminate expensive multiplications. Next, a mixed-precision multi-round shifting accumulation (MRSA) mechanism is further proposed to mitigate the accumulation overhead. A…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Image Enhancement Techniques
