Deep Time Delay Neural Network for Speech Enhancement with Full Data Learning
Cunhang Fan, Bin Liu, Jianhua Tao, Jiangyan Yi, Zhengqi Wen, Leichao, Song

TL;DR
This paper introduces a deep time delay neural network (TDNN) for speech enhancement that captures long-range temporal contexts efficiently, uses full data learning to maximize training data utilization, and achieves better performance with lower inference time compared to RNNs.
Contribution
The paper proposes a novel TDNN architecture with full data learning for speech enhancement, improving performance and reducing inference time compared to traditional RNNs and BLSTMs.
Findings
Outperforms DNN in speech enhancement tasks.
Achieves comparable or better results than BLSTM.
Reduces inference time significantly compared to RNN-based models.
Abstract
Recurrent neural networks (RNNs) have shown significant improvements in recent years for speech enhancement. However, the model complexity and inference time cost of RNNs are much higher than deep feed-forward neural networks (DNNs). Therefore, these limit the applications of speech enhancement. This paper proposes a deep time delay neural network (TDNN) for speech enhancement with full data learning. The TDNN has excellent potential for capturing long range temporal contexts, which utilizes a modular and incremental design. Besides, the TDNN preserves the feed-forward structure so that its inference cost is comparable to standard DNN. To make full use of the training data, we propose a full data learning method for speech enhancement. More specifically, we not only use the noisy-to-clean (input-to-target) to train the enhanced model, but also the clean-to-clean and noise-to-silence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Infant Health and Development
