Efficient Low-Latency Speech Enhancement with Mobile Audio Streaming   Networks

Micha{\l} Romaniuk; Piotr Masztalski; Karol Piaskowski; Mateusz; Matuszewski

arXiv:2008.07244·eess.AS·August 18, 2020

Efficient Low-Latency Speech Enhancement with Mobile Audio Streaming Networks

Micha{\l} Romaniuk, Piotr Masztalski, Karol Piaskowski, Mateusz, Matuszewski

PDF

TL;DR

This paper introduces MASnet, a low-latency, efficient speech enhancement network optimized for mobile devices, utilizing depthwise and pointwise convolutions to reduce computational load while maintaining real-time performance.

Contribution

The paper presents MASnet, a novel architecture that achieves low-latency speech enhancement suitable for mobile devices by reducing computational complexity with depthwise and pointwise convolutions.

Findings

01

MASnet reduces FMA/s significantly compared to fully-convolutional models.

02

MASnet maintains real-time processing capabilities in low-latency mode.

03

SNR is slightly reduced but acceptable for practical applications.

Abstract

We propose Mobile Audio Streaming Networks (MASnet) for efficient low-latency speech enhancement, which is particularly suitable for mobile devices and other applications where computational capacity is a limitation. MASnet processes linear-scale spectrograms, transforming successive noisy frames into complex-valued ratio masks which are then applied to the respective noisy frames. MASnet can operate in a low-latency incremental inference mode which matches the complexity of layer-by-layer batch mode. Compared to a similar fully-convolutional architecture, MASnet incorporates depthwise and pointwise convolutions for a large reduction in fused multiply-accumulate operations per second (FMA/s), at the cost of some reduction in SNR.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.