Dustin: A 16-Cores Parallel Ultra-Low-Power Cluster with 2b-to-32b Fully Flexible Bit-Precision and Vector Lockstep Execution Mode
Gianmarco Ottavi, Angelo Garofalo, Giuseppe Tagliavini, Francesco, Conti, Alfio Di Mauro, Luca Benini, Davide Rossi

TL;DR
Dustin is a 16-core RISC-V cluster optimized for low-power, flexible bit-precision computation, featuring a novel vector lockstep mode that reduces power consumption with minimal performance loss, suitable for edge AI applications.
Contribution
This paper introduces Dustin, a fully programmable 16-core cluster with flexible bit-precision and a new vector lockstep execution mode to improve power efficiency for data-parallel workloads.
Findings
38% power reduction with minimal performance overhead
Achieves 58 GOPS peak performance
Reaches 1.15 TOPS/W peak efficiency
Abstract
Computationally intensive algorithms such as Deep Neural Networks (DNNs) are becoming killer applications for edge devices. Porting heavily data-parallel algorithms on resource-constrained and battery-powered devices poses several challenges related to memory footprint, computational throughput, and energy efficiency. Low-bitwidth and mixed-precision arithmetic have been proven to be valid strategies for tackling these problems. We present Dustin, a fully programmable compute cluster integrating 16 RISC-V cores capable of 2- to 32-bit arithmetic and all possible mixed-precision permutations. In addition to a conventional Multiple-Instruction Multiple-Data (MIMD) processing paradigm, Dustin introduces a Vector Lockstep Execution Mode (VLEM) to minimize power consumption in highly data-parallel kernels. In VLEM, a single leader core fetches instructions and broadcasts them to the 15…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Quantum Computing Algorithms and Architecture · Low-power high-performance VLSI design
