TL;DR
This paper challenges existing beliefs about FPGA-based PIM accelerators by introducing IMAGine, a design that achieves maximum BRAM clock frequency and scalability, setting new performance standards for GEMV operations.
Contribution
The paper defines a Gold Standard for PIM FPGA designs and demonstrates IMAGine as a practical implementation that surpasses prior PIM accelerators in speed and scalability.
Findings
IMAGine clocks at maximum BRAM frequency and scales to 100% of BRAMs.
Achieves 2.65x - 3.2x faster clock than existing PIM GEMV engines.
Outperforms TPU v1-v2 and Alibaba Hanguang 800 in clock speed.
Abstract
Many recent FPGA-based Processor-in-Memory (PIM) architectures have appeared with promises of impressive levels of parallelism but with performance that falls short of expectations due to reduced maximum clock frequencies, an inability to scale processing elements up to the maximum BRAM capacity, and minimal hardware support for large reduction operations. In this paper, we first establish what we believe should be a "Gold Standard" set of design objectives for PIM-based FPGA designs. This Gold Standard was established to serve as an absolute metric for comparing PIMs developed on different technology nodes and vendor families as well as an aspirational goal for designers. We then present IMAGine, an In-Memory Accelerated GEMV engine used as a case study to show the Gold Standard can be realized in practice. IMAGine serves as an existence proof that dispels several myths surrounding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
