RedMulE: A Compact FP16 Matrix-Multiplication Accelerator for Adaptive Deep Learning on RISC-V-Based Ultra-Low-Power SoCs
Yvan Tortorella, Luca Bertaccini, Davide Rossi, Luca Benini, Francesco, Conti

TL;DR
RedMulE is a compact, low-power FP16 matrix multiplication accelerator designed for RISC-V-based ultra-low-power SoCs, significantly improving energy efficiency and speed for deep learning tasks at the edge.
Contribution
The paper introduces RedMulE, a novel hardware accelerator for FP16 matrix multiplication optimized for ultra-low-power RISC-V SoCs, enabling efficient deep learning training and inference.
Findings
Achieves 31.6 MAC/cycle throughput at 666 MHz
Consumes only 43.5 mW power at cluster level
Provides up to 4.65x higher energy efficiency compared to software execution
Abstract
The fast proliferation of extreme-edge applications using Deep Learning (DL) based algorithms required dedicated hardware to satisfy extreme-edge applications' latency, throughput, and precision requirements. While inference is achievable in practical cases, online finetuning and adaptation of general DL models are still highly challenging. One of the key stumbling stones is the need for parallel floating-point operations, which are considered unaffordable on sub-100 mW extreme-edge SoCs. We tackle this problem with RedMulE (Reduced-precision matrix Multiplication Engine), a parametric low-power hardware accelerator for FP16 matrix multiplications - the main kernel of DL training and inference - conceived for tight integration within a cluster of tiny RISC-V cores based on the PULP (Parallel Ultra-Low-Power) architecture. In 22 nm technology, a 32-FMA RedMulE instance occupies just 0.07…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Ferroelectric and Negative Capacitance Devices · Semiconductor materials and devices
