A Flexible Template for Edge Generative AI with High-Accuracy Accelerated Softmax & GELU
Andrea Belano, Yvan Tortorella, Angelo Garofalo, Luca Benini, Davide, Rossi, Francesco Conti

TL;DR
This paper presents a specialized hardware accelerator, SoftEx, for fast and energy-efficient softmax and GELU computations in Transformer-based generative AI models, boosting performance and efficiency.
Contribution
Introduction of SoftEx, a novel accelerator for softmax and GELU, integrated into a heterogeneous RISC-V based cluster, achieving significant speedups and energy savings.
Findings
SoftEx achieves up to 10.8x speedup for softmax.
SoftEx reduces energy consumption by up to 10.8x.
End-to-end ViT inference throughput increases by 1.58x.
Abstract
Transformer-based generative Artificial Intelligence (GenAI) models achieve remarkable results in a wide range of fields, including natural language processing, computer vision, and audio processing. However, this comes at the cost of increased complexity and the need of sophisticated non-linearities such as softmax and GELU. Even if Transformers are computationally dominated by matrix multiplications (MatMul), these non-linearities can become a performance bottleneck, especially if dedicated hardware is used to accelerate MatMul operators. In this work, we introduce a GenAI BFloat16 Transformer acceleration template based on a heterogeneous tightly-coupled cluster containing 256KiB of shared SRAM, 8 general-purpose RISC-V cores, a 24x8 systolic array MatMul accelerator, and a novel accelerator for Transformer softmax and GELU non-linearities: SoftEx. SoftEx introduces an approximate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Distributed and Parallel Computing Systems
