Deep Time Delay Neural Network for Speech Enhancement with Full Data   Learning

Cunhang Fan; Bin Liu; Jianhua Tao; Jiangyan Yi; Zhengqi Wen; Leichao; Song

arXiv:2011.05591·cs.SD·November 12, 2020

Deep Time Delay Neural Network for Speech Enhancement with Full Data Learning

Cunhang Fan, Bin Liu, Jianhua Tao, Jiangyan Yi, Zhengqi Wen, Leichao, Song

PDF

Open Access

TL;DR

This paper introduces a deep time delay neural network (TDNN) for speech enhancement that captures long-range temporal contexts efficiently, uses full data learning to maximize training data utilization, and achieves better performance with lower inference time compared to RNNs.

Contribution

The paper proposes a novel TDNN architecture with full data learning for speech enhancement, improving performance and reducing inference time compared to traditional RNNs and BLSTMs.

Findings

01

Outperforms DNN in speech enhancement tasks.

02

Achieves comparable or better results than BLSTM.

03

Reduces inference time significantly compared to RNN-based models.

Abstract

Recurrent neural networks (RNNs) have shown significant improvements in recent years for speech enhancement. However, the model complexity and inference time cost of RNNs are much higher than deep feed-forward neural networks (DNNs). Therefore, these limit the applications of speech enhancement. This paper proposes a deep time delay neural network (TDNN) for speech enhancement with full data learning. The TDNN has excellent potential for capturing long range temporal contexts, which utilizes a modular and incremental design. Besides, the TDNN preserves the feed-forward structure so that its inference cost is comparable to standard DNN. To make full use of the training data, we propose a full data learning method for speech enhancement. More specifically, we not only use the noisy-to-clean (input-to-target) to train the enhanced model, but also the clean-to-clean and noise-to-silence…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Infant Health and Development