Compute RAMs: Adaptable Compute and Storage Blocks for DL-Optimized   FPGAs

Aman Arora; Bagus Hanindhito; Lizy K. John

arXiv:2107.09178·cs.AR·October 1, 2021

Compute RAMs: Adaptable Compute and Storage Blocks for DL-Optimized FPGAs

Aman Arora, Bagus Hanindhito, Lizy K. John

PDF

Open Access

TL;DR

This paper introduces Compute RAMs, a new adaptable block for FPGAs that combines processing and storage, significantly improving energy efficiency, bandwidth, and compute density for deep learning and other applications.

Contribution

The paper proposes Compute RAMs as a novel FPGA block that enables in-memory processing with dynamic operation modes, enhancing flexibility and performance.

Findings

01

80% average energy savings in DL operations

02

20% to 80% improvement in execution time

03

Enhanced FPGA compute density and adaptability

Abstract

The configurable building blocks of current FPGAs -- Logic blocks (LBs), Digital Signal Processing (DSP) slices, and Block RAMs (BRAMs) -- make them efficient hardware accelerators for the rapid-changing world of Deep Learning (DL). Communication between these blocks happens through an interconnect fabric consisting of switching elements spread throughout the FPGA. In this paper, a new block, Compute RAM, is proposed. Compute RAMs provide highly-parallel processing-in-memory (PIM) by combining computation and storage capabilities in one block. Compute RAMs can be integrated in the FPGA fabric just like the existing FPGA blocks and provide two modes of operation (storage or compute) that can be dynamically chosen. They reduce power consumption by reducing data movement, provide adaptable precision support, and increase the effective on-chip memory bandwidth. Compute RAMs also help…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Low-power high-performance VLSI design