Towards efficient models for real-time deep noise suppression
Sebastian Braun, Hannes Gamper, Chandan K.A. Reddy, Ivan Tashev

TL;DR
This paper explores small, resource-efficient deep learning models for real-time speech enhancement, balancing computational complexity with speech quality on real-world data.
Contribution
It investigates the tradeoffs between model complexity and speech quality, providing insights into designing efficient models for real-time noise suppression.
Findings
Smaller models can achieve competitive speech quality with proper architecture choices.
Recurrent and convolutional-recurrent architectures show different tradeoffs in complexity and performance.
Real-world evaluation with MOS estimator confirms the effectiveness of proposed models.
Abstract
With recent research advancements, deep learning models are becoming attractive and powerful choices for speech enhancement in real-time applications. While state-of-the-art models can achieve outstanding results in terms of speech quality and background noise reduction, the main challenge is to obtain compact enough models, which are resource efficient during inference time. An important but often neglected aspect for data-driven methods is that results can be only convincing when tested on real-world data and evaluated with useful metrics. In this work, we investigate reasonably small recurrent and convolutional-recurrent network architectures for speech enhancement, trained on a large dataset considering also reverberation. We show interesting tradeoffs between computational complexity and the achievable speech quality, measured on real recordings using a highly accurate MOS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
