Towards speech enhancement using a variational U-Net architecture
Eike J. Nustede, J\"orn Anem\"uller

TL;DR
This paper explores a variational U-Net architecture for speech enhancement, demonstrating improved spectral reconstruction and noise suppression capabilities over traditional methods, especially in reverberant and impulsive noise conditions.
Contribution
It introduces a probabilistic bottleneck into the U-Net for direct spectral reconstruction, showing its effectiveness without filter mask estimation.
Findings
Variational U-Net outperforms classic U-Net in PESQ and STOI scores.
Residual connections are essential for spectral reconstruction.
Improved suppression of impulsive noise sources.
Abstract
We investigate the viability of a variational U-Net architecture for denoising of single-channel audio data. Deep network speech enhancement systems commonly aim to estimate filter masks, or opt to work on the waveform signal, potentially neglecting relationships across higher dimensional spectro-temporal features. We study the adoption of a probabilistic bottleneck into the classic U-Net architecture for direct spectral reconstruction. Evaluation of several ablation network variants is carried out using signal-to-distortion ratio and perceptual measures, on audio data that includes known and unknown noise types as well as reverberation. Our experiments show that the residual (skip) connections in the proposed system are a prerequisite for successful spectral reconstruction, i.e., without filter mask estimation. Results show, on average, an advantage of the proposed variational U-Net…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Acoustic Wave Phenomena Research · Hearing Loss and Rehabilitation
MethodsConcatenated Skip Connection · *Communicated@Fast*How Do I Communicate to Expedia? · Convolution · Max Pooling · U-Net
