ARTEMIS: A Mixed Analog-Stochastic In-DRAM Accelerator for Transformer   Neural Networks

Salma Afifi; Ishan Thakkar; Sudeep Pasricha

arXiv:2407.12638·cs.AR·July 18, 2024

ARTEMIS: A Mixed Analog-Stochastic In-DRAM Accelerator for Transformer Neural Networks

Salma Afifi, Ishan Thakkar, Sudeep Pasricha

PDF

Open Access

TL;DR

ARTEMIS is a novel in-DRAM accelerator that combines analog and stochastic computing to significantly speed up transformer neural networks while reducing energy consumption, addressing the computational and memory challenges of traditional hardware.

Contribution

It introduces a mixed analog-stochastic in-DRAM architecture supporting transformer models with minimal DRAM modifications, enabling efficient in-memory processing.

Findings

01

Achieves at least 3.0x speedup over GPUs and TPUs

02

Reduces energy consumption by 1.8x compared to state-of-the-art hardware

03

Offers 1.9x better energy efficiency than existing accelerators

Abstract

Transformers have emerged as a powerful tool for natural language processing (NLP) and computer vision. Through the attention mechanism, these models have exhibited remarkable performance gains when compared to conventional approaches like recurrent neural networks (RNNs) and convolutional neural networks (CNNs). Nevertheless, transformers typically demand substantial execution time due to their extensive computations and large memory footprint. Processing in-memory (PIM) and near-memory computing (NMC) are promising solutions to accelerating transformers as they offer high compute parallelism and memory bandwidth. However, designing PIM/NMC architectures to support the complex operations and massive amounts of data that need to be moved between layers in transformer neural networks remains a challenge. We propose ARTEMIS, a mixed analog-stochastic in-DRAM accelerator for transformer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Advanced Memory and Neural Computing · Stochastic Gradient Optimization Techniques

MethodsSoftmax · Attention Is All You Need