On TasNet for Low-Latency Single-Speaker Speech Enhancement
Morten Kolb{\ae}k, Zheng-Hua Tan, S{\o}ren Holdt Jensen, Jesper Jensen

TL;DR
This paper demonstrates that TasNet, a time-domain neural network architecture, effectively enhances single-speaker speech, especially for modulated noise, outperforming existing systems and revealing insights into its internal representations and limitations.
Contribution
The study extends TasNet's application to speech enhancement, showing its superiority over state-of-the-art methods and analyzing its internal representations and performance limitations.
Findings
TasNet improves speech enhancement, especially for modulated noise.
TasNet learns an inner-domain representation with separable target and noise signals.
Performance drops with large frame hops, likely due to aliasing.
Abstract
In recent years, speech processing algorithms have seen tremendous progress primarily due to the deep learning renaissance. This is especially true for speech separation where the time-domain audio separation network (TasNet) has led to significant improvements. However, for the related task of single-speaker speech enhancement, which is of obvious importance, it is yet unknown, if the TasNet architecture is equally successful. In this paper, we show that TasNet improves state-of-the-art also for speech enhancement, and that the largest gains are achieved for modulated noise sources such as speech. Furthermore, we show that TasNet learns an efficient inner-domain representation, where target and noise signal components are highly separable. This is especially true for noise in terms of interfering speech signals, which might explain why TasNet performs so well on the separation task.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Music and Audio Processing
