Invertible DNN-based nonlinear time-frequency transform for speech enhancement
Daiki Takeuchi, Kohei Yatabe, Yuma Koizumi, Yasuhiro Oikawa, Noboru, Harada

TL;DR
This paper introduces an invertible DNN-based nonlinear time-frequency transform for speech enhancement, enabling end-to-end training while preserving perfect reconstruction, thus combining interpretability with deep learning capabilities.
Contribution
It proposes a novel invertible nonlinear T-F transform based on DNNs that ensures perfect reconstruction, improving interpretability and effectiveness in speech enhancement.
Findings
The proposed transform achieves perfect reconstruction.
Enhanced speech quality demonstrated in experiments.
Outperforms traditional linear T-F transforms.
Abstract
We propose an end-to-end speech enhancement method with trainable time-frequency~(T-F) transform based on invertible deep neural network~(DNN). The resent development of speech enhancement is brought by using DNN. The ordinary DNN-based speech enhancement employs T-F transform, typically the short-time Fourier transform~(STFT), and estimates a T-F mask using DNN. On the other hand, some methods have considered end-to-end networks which directly estimate the enhanced signals without T-F transform. While end-to-end methods have shown promising results, they are black boxes and hard to understand. Therefore, some end-to-end methods used a DNN to learn the linear T-F transform which is much easier to understand. However, the learned transform may not have a property important for ordinary signal processing. In this paper, as the important property of the T-F transform, perfect…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Blind Source Separation Techniques · Advanced Adaptive Filtering Techniques
