Accelerating RNN-based Speech Enhancement on a Multi-Core MCU with Mixed FP16-INT8 Post-Training Quantization
Manuele Rusci, Marco Fariselli, Martin Croome, Francesco Paci, Eric, Flamand

TL;DR
This paper introduces a mixed-precision quantization method and optimized software pipeline to accelerate RNN-based speech enhancement on multi-core MCUs, achieving significant speed, memory, and power savings with minimal accuracy loss.
Contribution
It proposes a novel FP16-INT8 mixed-precision post-training quantization scheme and an optimized parallel computation pipeline for RNNs on multi-core MCUs.
Findings
Up to 4x speed-up over FP16 baseline.
Only 0.06 PESQ score degradation with mixed-precision quantization.
Achieves 2.5x power saving and 10x energy efficiency improvement.
Abstract
This paper presents an optimized methodology to design and deploy Speech Enhancement (SE) algorithms based on Recurrent Neural Networks (RNNs) on a state-of-the-art MicroController Unit (MCU), with 1+8 general-purpose RISC-V cores. To achieve low-latency execution, we propose an optimized software pipeline interleaving parallel computation of LSTM or GRU recurrent blocks, featuring vectorized 8-bit integer (INT8) and 16-bit floating-point (FP16) compute units, with manually-managed memory transfers of model parameters. To ensure minimal accuracy degradation with respect to the full-precision models, we propose a novel FP16-INT8 Mixed-Precision Post-Training Quantization (PTQ) scheme that compresses the recurrent layers to 8-bit while the bit precision of remaining layers is kept to FP16. Experiments are conducted on multiple LSTM and GRU based SE models trained on the Valentini dataset,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Indoor and Outdoor Localization Technologies
MethodsTanh Activation · Gated Recurrent Unit · Sigmoid Activation · Long Short-Term Memory
