Analysis of Noisy-target Training for DNN-based speech enhancement

Takuya Fujimura; Tomoki Toda

arXiv:2211.01198·eess.AS·November 3, 2022

Analysis of Noisy-target Training for DNN-based speech enhancement

Takuya Fujimura, Tomoki Toda

PDF

Open Access

TL;DR

This paper analyzes Noisy-target Training (NyTT) for speech enhancement, revealing its properties, proposing improvements, and demonstrating performance gains using large noisy datasets, thus addressing data scarcity issues.

Contribution

The paper provides a detailed analysis of NyTT, proposes a refined training method, and demonstrates performance improvements with large noisy datasets.

Findings

01

NyTT can train DNNs without clean speech.

02

Refined method achieves performance comparable to clean speech training.

03

Using large noisy datasets improves speech enhancement results.

Abstract

Deep neural network (DNN)-based speech enhancement usually uses a clean speech as a training target. However, it is hard to collect large amounts of clean speech because the recording is very costly. In other words, the performance of current speech enhancement has been limited by the amount of training data. To relax this limitation, Noisy-target Training (NyTT) that utilizes noisy speech as a training target has been proposed. Although it has been experimentally shown that NyTT can train a DNN without clean speech, a detailed analysis has not been conducted and its behavior has not been understood well. In this paper, we conduct various analyses to deepen our understanding of NyTT. In addition, based on the property of NyTT, we propose a refined method that is comparable to the method using clean speech. Furthermore, we show that we can improve the performance by using a huge amount…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Indoor and Outdoor Localization Technologies · Speech Recognition and Synthesis