Quark: An Integer RISC-V Vector Processor for Sub-Byte Quantized DNN   Inference

MohammadHossein AskariHemmat; Theo Dupuis; Yoan Fournier; Nizar El; Zarif; Matheus Cavalcante; Matteo Perotti; Frank Gurkaynak; Luca Benini,; Francois Leduc-Primeau; Yvon Savaria; Jean-Pierre David

arXiv:2302.05996·cs.AR·February 14, 2023

Quark: An Integer RISC-V Vector Processor for Sub-Byte Quantized DNN Inference

MohammadHossein AskariHemmat, Theo Dupuis, Yoan Fournier, Nizar El, Zarif, Matheus Cavalcante, Matteo Perotti, Frank Gurkaynak, Luca Benini,, Francois Leduc-Primeau, Yvon Savaria, Jean-Pierre David

PDF

Open Access 1 Repo

TL;DR

Quark is a specialized integer RISC-V vector processor optimized for efficient sub-byte quantized DNN inference, achieving significant size and power improvements over existing designs.

Contribution

It introduces a novel extension of the Ara processor with sub-byte vector instructions and removes floating-point units to optimize for quantized neural network inference.

Findings

01

Quark can run 1-bit and 2-bit quantized models effectively.

02

It accelerates Conv2d computations for various input and kernel sizes.

03

Quark is more power-efficient and smaller than Ara.

Abstract

In this paper, we present Quark, an integer RISC-V vector processor specifically tailored for sub-byte DNN inference. Quark is implemented in GlobalFoundries' 22FDX FD-SOI technology. It is designed on top of Ara, an open-source 64-bit RISC-V vector processor. To accommodate sub-byte DNN inference, Quark extends Ara by adding specialized vector instructions to perform sub-byte quantized operations. We also remove the floating-point unit from Quarks' lanes and use the CVA6 RISC-V scalar core for the re-scaling operations that are required in quantized neural network inference. This makes each lane of Quark 2 times smaller and 1.9 times more power efficient compared to the ones of Ara. In this paper we show that Quark can run quantized models at sub-byte precision. Notably we show that for 1-bit and 2-bit quantized models, Quark can accelerate computation of Conv2d over various ranges of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

PolyMTL-Gr2m/ara
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Neural Networks and Applications · Parallel Computing and Optimization Techniques