Atleus: Accelerating Transformers on the Edge Enabled by 3D Heterogeneous Manycore Architectures
Pratyush Dhingra, Janardhan Rao Doppa, and Partha Pratim Pande

TL;DR
Atleus is a novel 3D heterogeneous architecture designed to accelerate transformer models for edge applications, supporting both fine-tuning and inference with significant improvements in performance and energy efficiency.
Contribution
The paper introduces Atleus, a 3D heterogeneous platform optimized for transformer acceleration, incorporating non-volatile memory, systolic arrays, and a tailored NoC for edge deployment.
Findings
Achieves up to 56x performance improvement over existing solutions.
Attains 64.5x better energy efficiency compared to state-of-the-art.
Effectively supports model compression through quantization schemes.
Abstract
Transformer architectures have become the standard neural network model for various machine learning applications including natural language processing and computer vision. However, the compute and memory requirements introduced by transformer models make them challenging to adopt for edge applications. Furthermore, fine-tuning pre-trained transformers (e.g., foundation models) is a common task to enhance the model's predictive performance on specific tasks/applications. Existing transformer accelerators are oblivious to complexities introduced by fine-tuning. In this paper, we propose the design of a three-dimensional (3D) heterogeneous architecture referred to as Atleus that incorporates heterogeneous computing resources specifically optimized to accelerate transformer models for the dual purposes of fine-tuning and inference. Specifically, Atleus utilizes non-volatile memory and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPlasma Diagnostics and Applications
MethodsADaptive gradient method with the OPTimal convergence rate
