Improved Normalizing Flow-Based Speech Enhancement using an All-pole Gammatone Filterbank for Conditional Input Representation
Martin Strauss, Matteo Torcoli, Bernd Edler

TL;DR
This paper introduces an improved normalizing flow model for speech enhancement that utilizes an all-pole Gammatone filterbank for conditional input, demonstrating superior perceptual quality over GANs especially at low SNRs.
Contribution
The paper proposes architectural improvements and a novel APG filterbank for NF-based speech enhancement, showing perceptual advantages over GANs.
Findings
NF with APG outperforms GANs in perceptual quality at low SNRs
APG provides high temporal resolution for better speech enhancement
NF approach achieves good quality ratings in listening tests
Abstract
Deep generative models for Speech Enhancement (SE) received increasing attention in recent years. The most prominent example are Generative Adversarial Networks (GANs), while normalizing flows (NF) received less attention despite their potential. Building on previous work, architectural modifications are proposed, along with an investigation of different conditional input representations. Despite being a common choice in related works, Mel-spectrograms demonstrate to be inadequate for the given scenario. Alternatively, a novel All-Pole Gammatone filterbank (APG) with high temporal resolution is proposed. Although computational evaluation metric results would suggest that state-of-the-art GAN-based methods perform best, a perceptual evaluation via a listening test indicates that the presented NF approach (based on time domain and APG) performs best, especially at lower SNRs. On average,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Hearing Loss and Rehabilitation
MethodsTest · Normalizing Flows
