SPEED: A Scalable RISC-V Vector Processor Enabling Efficient   Multi-Precision DNN Inference

Chuanning Wang; Chao Fang; Xiao Wu; Zhongfeng Wang; Jun Lin

arXiv:2409.14017·cs.AR·October 10, 2024

SPEED: A Scalable RISC-V Vector Processor Enabling Efficient Multi-Precision DNN Inference

Chuanning Wang, Chao Fang, Xiao Wu, Zhongfeng Wang, Jun Lin

PDF

TL;DR

SPEED is a scalable RISC-V vector processor designed to efficiently support multi-precision DNN inference by introducing customized instructions, reconfigurable hardware, and flexible dataflow strategies, achieving high throughput and energy efficiency.

Contribution

The paper presents SPEED, a novel RISC-V vector processor with customized instructions and reconfigurable hardware for efficient multi-precision DNN inference, addressing limitations of existing processors.

Findings

01

Achieves 737.9 GOPS peak throughput for 4-bit operators.

02

Attains 1383.4 GOPS/W energy efficiency for 4-bit operators.

03

Outperforms prior RVV processors in area efficiency by up to 26.9 times.

Abstract

Deploying deep neural networks (DNNs) on those resource-constrained edge platforms is hindered by their substantial computation and storage demands. Quantized multi-precision DNNs, denoted as MP-DNNs, offer a promising solution for these limitations but pose challenges for existing RISC-V processors due to complex instructions, suboptimal parallel processing, and inefficient dataflow mapping. To tackle the challenges mentioned above, SPEED, a scalable RISC-V vector (RVV) processor, is proposed to enable efficient MP-DNN inference, incorporating innovations in customized instructions, hardware architecture, and dataflow mapping. Firstly, some dedicated customized RISC-V instructions are introduced based on RVV extensions to reduce the instruction complexity, allowing SPEED to support processing precision ranging from 4-bit to 16-bit with minimized hardware overhead. Secondly, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.