Speech Enhancement with Perceptually-motivated Optimization and Dual Transformations
Xucheng Wan, Kai Liu, Ziqing Du, Huan Zhou

TL;DR
This paper introduces PT-FSE, a novel sub-band speech enhancement system that leverages perceptually-motivated optimization and dual transformations, significantly improving performance and model efficiency on the DNS2020 benchmark.
Contribution
It proposes a new sub-band enhancement model with frequency and temporal transformations and a perceptually-inspired loss, advancing the state-of-the-art in speech enhancement.
Findings
Achieves the best speech enhancement results with NB-PESQ of 3.57.
Outperforms current state-of-the-art while being 27% smaller.
Demonstrates substantial improvements over its backbone model.
Abstract
To address the monaural speech enhancement problem, numerous research studies have been conducted to enhance speech via operations either in time-domain on the inner-domain learned from the speech mixture or in time--frequency domain on the fixed full-band short time Fourier transform (STFT) spectrograms. Very recently, a few studies on sub-band based speech enhancement have been proposed. By enhancing speech via operations on sub-band spectrograms, those studies demonstrated competitive performances on the benchmark dataset of DNS2020. Despite attractive, this new research direction has not been fully explored and there is still room for improvement. As such, in this study, we delve into the latest research direction and propose a sub-band based speech enhancement system with perceptually-motivated optimization and dual transformations, called PT-FSE. Specially, our proposed PT-FSE…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques
