PANTHER: A Programmable Architecture for Neural Network Training   Harnessing Energy-efficient ReRAM

Aayush Ankit; Izzat El Hajj; Sai Rahul Chalamalasetti; Sapan Agarwal,; Matthew Marinella; Martin Foltin; John Paul Strachan; Dejan Milojicic,; Wen-mei Hwu; Kaushik Roy

arXiv:1912.11516·cs.DC·December 30, 2019

PANTHER: A Programmable Architecture for Neural Network Training Harnessing Energy-efficient ReRAM

Aayush Ankit, Izzat El Hajj, Sai Rahul Chalamalasetti, Sapan Agarwal,, Matthew Marinella, Martin Foltin, John Paul Strachan, Dejan Milojicic,, Wen-mei Hwu, Kaushik Roy

PDF

TL;DR

PANTHER is a programmable neural network training accelerator that leverages energy-efficient ReRAM crossbars with enhanced precision techniques, significantly reducing energy consumption and training time across various neural network layers and algorithms.

Contribution

It introduces a novel bit-slicing technique for high-precision ReRAM-based outer products and develops a versatile ISA-programmable architecture with compiler support for diverse training algorithms.

Findings

01

Achieves up to 8.02x energy reduction compared to digital accelerators.

02

Achieves up to 54.21x energy reduction compared to ReRAM accelerators.

03

Achieves up to 16x faster training compared to GPUs.

Abstract

The wide adoption of deep neural networks has been accompanied by ever-increasing energy and performance demands due to the expensive nature of training them. Numerous special-purpose architectures have been proposed to accelerate training: both digital and hybrid digital-analog using resistive RAM (ReRAM) crossbars. ReRAM-based accelerators have demonstrated the effectiveness of ReRAM crossbars at performing matrix-vector multiplication operations that are prevalent in training. However, they still suffer from inefficiency due to the use of serial reads and writes for performing the weight gradient and update step. A few works have demonstrated the possibility of performing outer products in crossbars, which can be used to realize the weight gradient and update step without the use of serial reads and writes. However, these works have been limited to low precision operations which are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.