OpenGeMM: A High-Utilization GeMM Accelerator Generator with Lightweight RISC-V Control and Tight Memory Coupling
Xiaoling Yi, Ryan Antonio, Joren Dumoulin, Jiacong Sun, Josse Van, Delm, Guilherme Paim, Marian Verhelst

TL;DR
OpenGeMM is an open-source GeMM accelerator platform that combines high efficiency, high utilization, and programmability through a lightweight RISC-V control and tight memory coupling, optimized for neural network workloads.
Contribution
It introduces a configurable GeMM accelerator with RISC-V control and novel memory access strategies, achieving high utilization and efficiency for neural network acceleration.
Findings
Achieves 81.89% to 99.34% hardware utilization across CNN and Transformer workloads.
Demonstrates 3.58x to 16.40x speedup over state-of-the-art open-source accelerators.
Attains 4.68 TOPS/W system efficiency.
Abstract
Deep neural networks (DNNs) face significant challenges when deployed on resource-constrained extreme edge devices due to their computational and data-intensive nature. While standalone accelerators tailored for specific application scenarios suffer from inflexible control and limited programmability, generic hardware acceleration platforms coupled with RISC-V CPUs can enable high reusability and flexibility, yet typically at the expense of system level efficiency and low utilization. To fill this gap, we propose OpenGeMM, an open-source acceleration platform, jointly demonstrating high efficiency and utilization, as well as ease of configurability and programmability. OpenGeMM encompasses a parameterized Chisel-coded GeMM accelerator, a lightweight RISC-V processor, and a tightly coupled multi-banked scratchpad memory. The GeMM core utilization and system efficiency are boosted through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · CCD and CMOS Imaging Sensors
MethodsAttention Is All You Need · Absolute Position Encodings · Label Smoothing · Adam · Residual Connection · Softmax · Linear Layer · Dropout · Layer Normalization · Multi-Head Attention
