Resource-Efficient Speech Mask Estimation for Multi-Channel Speech   Enhancement

Lukas Pfeifenberger; Matthias Z\"ohrer; G\"unther Schindler; Wolfgang; Roth; Holger Fr\"oning; Franz Pernkopf

arXiv:2007.11477·eess.AS·July 23, 2020

Resource-Efficient Speech Mask Estimation for Multi-Channel Speech Enhancement

Lukas Pfeifenberger, Matthias Z\"ohrer, G\"unther Schindler, Wolfgang, Roth, Holger Fr\"oning, Franz Pernkopf

PDF

Open Access

TL;DR

This paper presents a resource-efficient deep learning approach for multi-channel speech enhancement that uses reduced-precision neural networks to estimate speech masks, enabling faster and more memory-efficient processing suitable for embedded systems.

Contribution

It introduces a novel method employing reduced-precision DNNs for speech mask estimation, achieving comparable audio quality with significantly lower resource consumption.

Findings

01

Significant reduction in execution time and memory footprint.

02

Audio quality close to single-precision DNNs.

03

Slight increase in Word Error Rate for single speaker scenarios.

Abstract

While machine learning techniques are traditionally resource intensive, we are currently witnessing an increased interest in hardware and energy efficient approaches. This need for resource-efficient machine learning is primarily driven by the demand for embedded systems and their usage in ubiquitous computing and IoT applications. In this article, we provide a resource-efficient approach for multi-channel speech enhancement based on Deep Neural Networks (DNNs). In particular, we use reduced-precision DNNs for estimating a speech mask from noisy, multi-channel microphone observations. This speech mask is used to obtain either the Minimum Variance Distortionless Response (MVDR) or Generalized Eigenvalue (GEV) beamformer. In the extreme case of binary weights and reduced precision activations, a significant reduction of execution time and memory footprint is possible while still obtaining…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques