Continuous Modeling of the Denoising Process for Speech Enhancement Based on Deep Learning
Zilu Guo, Jun Du, CHin-Hui Lee

TL;DR
This paper proposes a continuous deep learning model for speech enhancement that allows controllable noise reduction and improves speech quality and recognition performance by modeling the denoising process as a continuous variable.
Contribution
It introduces a novel continuous modeling approach for speech denoising using a state variable and a controllable embedding, enabling adjustable noise reduction.
Findings
Controllable noise reduction improves speech quality.
Preserving small noise levels benefits speech recognition.
The method outperforms traditional discrete denoising models.
Abstract
In this paper, we explore a continuous modeling approach for deep-learning-based speech enhancement, focusing on the denoising process. We use a state variable to indicate the denoising process. The starting state is noisy speech and the ending state is clean speech. The noise component in the state variable decreases with the change of the state index until the noise component is 0. During training, a UNet-like neural network learns to estimate every state variable sampled from the continuous denoising process. In testing, we introduce a controlling factor as an embedding, ranging from zero to one, to the neural network, allowing us to control the level of noise reduction. This approach enables controllable speech enhancement and is adaptable to various application scenarios. Experimental results indicate that preserving a small amount of noise in the clean target benefits speech…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Ultrasonics and Acoustic Wave Propagation
