A Mixed-Precision RISC-V Processor for Extreme-Edge DNN Inference

Gianmarco Ottavi; Angelo Garofalo; Giuseppe Tagliavini; Francesco; Conti; Luca Benini; Davide Rossi

arXiv:2010.04073·cs.AR·October 9, 2020

A Mixed-Precision RISC-V Processor for Extreme-Edge DNN Inference

Gianmarco Ottavi, Angelo Garofalo, Giuseppe Tagliavini, Francesco, Conti, Luca Benini, Davide Rossi

PDF

Open Access

TL;DR

This paper introduces MPIC, a novel RISC-V processor extension supporting mixed-precision quantized neural network inference on microcontrollers, significantly improving performance and energy efficiency without increasing ISA complexity.

Contribution

The work presents a status-based SIMD instruction set extension for RISC-V that enables dynamic mixed-precision QNN inference support without extra opcodes or decode complexity.

Findings

01

MPIC improves performance and energy efficiency by up to 4.9x compared to software implementations.

02

It outperforms Cortex-M4 and M7 microcontrollers by up to 11.7x in performance.

03

Supports 16-, 8-, 4-, and 2-bit precision for mixed-precision QNNs.

Abstract

Low bit-width Quantized Neural Networks (QNNs) enable deployment of complex machine learning models on constrained devices such as microcontrollers (MCUs) by reducing their memory footprint. Fine-grained asymmetric quantization (i.e., different bit-widths assigned to weights and activations on a tensor-by-tensor basis) is a particularly interesting scheme to maximize accuracy under a tight memory constraint. However, the lack of sub-byte instruction set architecture (ISA) support in SoA microprocessors makes it hard to fully exploit this extreme quantization paradigm in embedded MCUs. Support for sub-byte and asymmetric QNNs would require many precision formats and an exorbitant amount of opcode space. In this work, we attack this problem with status-based SIMD instructions: rather than encoding precision explicitly, each operand's precision is set dynamically in a core status register.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Machine Learning and ELM