Acceleration of Deep Neural Network Training with Resistive Cross-Point Devices
Tayfun Gokmen, Yurii Vlasov

TL;DR
This paper introduces resistive processing unit (RPU) devices that can drastically accelerate deep neural network training by enabling local weight storage and updates, reducing data movement, and achieving significant speed and power efficiency improvements.
Contribution
The paper proposes a novel RPU device architecture and system specifications that enable large-scale DNN training acceleration and power efficiency in CMOS-compatible technology.
Findings
Achieves 30,000X acceleration for billion-parameter DNNs.
Provides power efficiency of 84,000 GigaOps/s/W.
Reduces training time from days to hours on a single RPU accelerator.
Abstract
In recent years, deep neural networks (DNN) have demonstrated significant business impact in large scale analysis and classification tasks such as speech recognition, visual object detection, pattern extraction, etc. Training of large DNNs, however, is universally considered as time consuming and computationally intensive task that demands datacenter-scale computational resources recruited for many days. Here we propose a concept of resistive processing unit (RPU) devices that can potentially accelerate DNN training by orders of magnitude while using much less power. The proposed RPU device can store and update the weight values locally thus minimizing data movement during training and allowing to fully exploit the locality and the parallelism of the training algorithm. We identify the RPU device and system specifications for implementation of an accelerator chip for DNN training in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
