Compute RAMs: Adaptable Compute and Storage Blocks for DL-Optimized FPGAs
Aman Arora, Bagus Hanindhito, Lizy K. John

TL;DR
This paper introduces Compute RAMs, a new adaptable block for FPGAs that combines processing and storage, significantly improving energy efficiency, bandwidth, and compute density for deep learning and other applications.
Contribution
The paper proposes Compute RAMs as a novel FPGA block that enables in-memory processing with dynamic operation modes, enhancing flexibility and performance.
Findings
80% average energy savings in DL operations
20% to 80% improvement in execution time
Enhanced FPGA compute density and adaptability
Abstract
The configurable building blocks of current FPGAs -- Logic blocks (LBs), Digital Signal Processing (DSP) slices, and Block RAMs (BRAMs) -- make them efficient hardware accelerators for the rapid-changing world of Deep Learning (DL). Communication between these blocks happens through an interconnect fabric consisting of switching elements spread throughout the FPGA. In this paper, a new block, Compute RAM, is proposed. Compute RAMs provide highly-parallel processing-in-memory (PIM) by combining computation and storage capabilities in one block. Compute RAMs can be integrated in the FPGA fabric just like the existing FPGA blocks and provide two modes of operation (storage or compute) that can be dynamically chosen. They reduce power consumption by reducing data movement, provide adaptable precision support, and increase the effective on-chip memory bandwidth. Compute RAMs also help…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Low-power high-performance VLSI design
