Memory Requirement Reduction of Deep Neural Networks Using Low-bit   Quantization of Parameters

Niccol\'o Nicodemo; Gaurav Naithani; Konstantinos Drossos and; Tuomas Virtanen; Roberto Saletti

arXiv:1911.00527·eess.AS·November 5, 2019·1 cites

Memory Requirement Reduction of Deep Neural Networks Using Low-bit Quantization of Parameters

Niccol\'o Nicodemo, Gaurav Naithani, Konstantinos Drossos and, Tuomas Virtanen, Roberto Saletti

PDF

Open Access

TL;DR

This paper introduces a non-uniform, dynamic quantization method with a virtual bit shift scheme to significantly reduce DNN memory requirements while maintaining performance, validated on speech enhancement tasks.

Contribution

It proposes a novel non-uniform quantization approach with VBS for layer-specific parameter compression in DNNs, improving memory efficiency without substantial accuracy loss.

Findings

01

50% reduction in DNN memory footprint

02

Only 2.7% drop in speech intelligibility performance

03

Validated on speech enhancement application

Abstract

Effective employment of deep neural networks (DNNs) in mobile devices and embedded systems is hampered by requirements for memory and computational power. This paper presents a non-uniform quantization approach which allows for dynamic quantization of DNN parameters for different layers and within the same layer. A virtual bit shift (VBS) scheme is also proposed to improve the accuracy of the proposed scheme. Our method reduces the memory requirements, preserving the performance of the network. The performance of our method is validated in a speech enhancement application, where a fully connected DNN is used to predict the clean speech spectrum from the input noisy speech spectrum. A DNN is optimized and its memory footprint and performance are evaluated using the short-time objective intelligibility, STOI, metric. The application of the low-bit quantization allows a 50% reduction of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Blind Source Separation Techniques · Advanced Data Compression Techniques