Using Optimal Ratio Mask as Training Target for Supervised Speech   Separation

Shasha Xia; Hao Li; Xueliang Zhang

arXiv:1709.00917·cs.SD·September 5, 2017·2 cites

Using Optimal Ratio Mask as Training Target for Supervised Speech Separation

Shasha Xia, Hao Li, Xueliang Zhang

PDF

Open Access

TL;DR

This paper proposes using the optimal ratio mask as a training target for supervised speech separation with deep neural networks, demonstrating improved performance across various noise conditions.

Contribution

It introduces the optimal ratio mask as a novel training target that considers noise-speech correlation, enhancing speech separation performance.

Findings

01

Optimal ratio mask outperforms other targets in various noise environments

02

Improved speech quality and intelligibility with the proposed method

03

Robustness across different SNR conditions

Abstract

Supervised speech separation uses supervised learning algorithms to learn a mapping from an input noisy signal to an output target. With the fast development of deep learning, supervised separation has become the most important direction in speech separation area in recent years. For the supervised algorithm, training target has a significant impact on the performance. Ideal ratio mask is a commonly used training target, which can improve the speech intelligibility and quality of the separated speech. However, it does not take into account the correlation between noise and clean speech. In this paper, we use the optimal ratio mask as the training target of the deep neural network (DNN) for speech separation. The experiments are carried out under various noise environments and signal to noise ratio (SNR) conditions. The results show that the optimal ratio mask outperforms other training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis