Multiscale Co-Design Analysis of Energy, Latency, Area, and Accuracy of   a ReRAM Analog Neural Training Accelerator

Matthew J. Marinella; Sapan Agarwal; Alexander Hsia; Isaac Richter,; Robin Jacobs-Gedrim; John Niroula; Steven J. Plimpton; Engin Ipek; Conrad D.; James

arXiv:1707.09952·cs.AR·February 20, 2018

Multiscale Co-Design Analysis of Energy, Latency, Area, and Accuracy of a ReRAM Analog Neural Training Accelerator

Matthew J. Marinella, Sapan Agarwal, Alexander Hsia, Isaac Richter,, Robin Jacobs-Gedrim, John Niroula, Steven J. Plimpton, Engin Ipek, Conrad D., James

PDF

TL;DR

This paper presents a detailed analysis of an analog ReRAM-based neural network training accelerator, demonstrating significant energy and latency advantages over digital designs, while discussing methods to mitigate accuracy loss.

Contribution

It provides a comprehensive circuit and device-level analysis of an analog ReRAM accelerator for neural networks, highlighting its performance benefits and avenues for improving training accuracy.

Findings

01

270x energy reduction compared to digital ReRAM

02

540x latency improvement over digital ReRAM

03

11 fJ per MAC operation in the analog accelerator

Abstract

Neural networks are an increasingly attractive algorithm for natural language processing and pattern recognition. Deep networks with >50M parameters are made possible by modern GPU clusters operating at <50 pJ per op and more recently, production accelerators capable of <5pJ per operation at the board level. However, with the slowing of CMOS scaling, new paradigms will be required to achieve the next several orders of magnitude in performance per watt gains. Using an analog resistive memory (ReRAM) crossbar to perform key matrix operations in an accelerator is an attractive option. This work presents a detailed design using a state of the art 14/16 nm PDK for of an analog crossbar circuit block designed to process three key kernels required in training and inference of neural networks. A detailed circuit and device-level analysis of energy, latency, area, and accuracy are given and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.