# FPSA: A Full System Stack Solution for Reconfigurable ReRAM-based NN   Accelerator Architecture

**Authors:** Yu Ji, Youyang Zhang, Xinfeng Xie, Shuangchen Li, Peiqi Wang, Xing Hu,, Youhui Zhang, Yuan Xie

arXiv: 1901.09904 · 2019-01-30

## TL;DR

This paper introduces FPSA, a comprehensive system stack for ReRAM-based neural network accelerators, achieving significant improvements in computational density and inference speed through reconfigurable hardware and software co-design.

## Contribution

The paper presents a novel full system stack architecture with reconfigurable hardware and software tools that enhance ReRAM-based NN accelerator efficiency and density.

## Key findings

- 31x increase in computational density
- up to 1000x inference speedup
- effective deployment of deep neural networks

## Abstract

Neural Network (NN) accelerators with emerging ReRAM (resistive random access memory) technologies have been investigated as one of the promising solutions to address the \textit{memory wall} challenge, due to the unique capability of \textit{processing-in-memory} within ReRAM-crossbar-based processing elements (PEs). However, the high efficiency and high density advantages of ReRAM have not been fully utilized due to the huge communication demands among PEs and the overhead of peripheral circuits.   In this paper, we propose a full system stack solution, composed of a reconfigurable architecture design, Field Programmable Synapse Array (FPSA) and its software system including neural synthesizer, temporal-to-spatial mapper, and placement & routing. We highly leverage the software system to make the hardware design compact and efficient. To satisfy the high-performance communication demand, we optimize it with a reconfigurable routing architecture and the placement & routing tool. To improve the computational density, we greatly simplify the PE circuit with the spiking schema and then adopt neural synthesizer to enable the high density computation-resources to support different kinds of NN operations. In addition, we provide spiking memory blocks (SMBs) and configurable logic blocks (CLBs) in hardware and leverage the temporal-to-spatial mapper to utilize them to balance the storage and computation requirements of NN. Owing to the end-to-end software system, we can efficiently deploy existing deep neural networks to FPSA. Evaluations show that, compared to one of state-of-the-art ReRAM-based NN accelerators, PRIME, the computational density of FPSA improves by 31x; for representative NNs, its inference performance can achieve up to 1000x speedup.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1901.09904/full.md

## Figures

13 figures with captions in the complete paper: https://tomesphere.com/paper/1901.09904/full.md

## References

51 references — full list in the complete paper: https://tomesphere.com/paper/1901.09904/full.md

---
Source: https://tomesphere.com/paper/1901.09904