A Flow-Based Neural Network for Time Domain Speech Enhancement
Martin Strauss, Bernd Edler

TL;DR
This paper introduces a flow-based neural network model for time domain speech enhancement, adapting WaveGlow for direct noisy speech enhancement and demonstrating competitive results with state-of-the-art methods.
Contribution
The paper presents a novel flow-based framework for speech enhancement that directly models clean speech conditioned on noisy input, using adapted WaveGlow and input companding techniques.
Findings
Achieves comparable results to GAN-based methods
Surpasses baseline models on objective metrics
Demonstrates effectiveness of nonlinear input companding
Abstract
Speech enhancement involves the distinction of a target speech signal from an intrusive background. Although generative approaches using Variational Autoencoders or Generative Adversarial Networks (GANs) have increasingly been used in recent years, normalizing flow (NF) based systems are still scarse, despite their success in related fields. Thus, in this paper we propose a NF framework to directly model the enhancement process by density estimation of clean speech utterances conditioned on their noisy counterpart. The WaveGlow model from speech synthesis is adapted to enable direct enhancement of noisy utterances in time domain. In addition, we demonstrate that nonlinear input companding benefits the model performance by equalizing the distribution of input samples. Experimental evaluation on a publicly available dataset shows comparable results to current state-of-the-art GAN-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsInvertible 1x1 Convolution · Affine Coupling · Normalizing Flows · WaveGlow
