Efficient Low-Latency Speech Enhancement with Mobile Audio Streaming Networks
Micha{\l} Romaniuk, Piotr Masztalski, Karol Piaskowski, Mateusz, Matuszewski

TL;DR
This paper introduces MASnet, a low-latency, efficient speech enhancement network optimized for mobile devices, utilizing depthwise and pointwise convolutions to reduce computational load while maintaining real-time performance.
Contribution
The paper presents MASnet, a novel architecture that achieves low-latency speech enhancement suitable for mobile devices by reducing computational complexity with depthwise and pointwise convolutions.
Findings
MASnet reduces FMA/s significantly compared to fully-convolutional models.
MASnet maintains real-time processing capabilities in low-latency mode.
SNR is slightly reduced but acceptable for practical applications.
Abstract
We propose Mobile Audio Streaming Networks (MASnet) for efficient low-latency speech enhancement, which is particularly suitable for mobile devices and other applications where computational capacity is a limitation. MASnet processes linear-scale spectrograms, transforming successive noisy frames into complex-valued ratio masks which are then applied to the respective noisy frames. MASnet can operate in a low-latency incremental inference mode which matches the complexity of layer-by-layer batch mode. Compared to a similar fully-convolutional architecture, MASnet incorporates depthwise and pointwise convolutions for a large reduction in fused multiply-accumulate operations per second (FMA/s), at the cost of some reduction in SNR.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
