Unsupervised Training for Deep Speech Source Separation with   Kullback-Leibler Divergence Based Probabilistic Loss Function

Masahito Togami; Yoshiki Masuyama; Tatsuya Komatsu; Yu Nakagome

arXiv:1911.04228·eess.AS·November 12, 2019·1 cites

Unsupervised Training for Deep Speech Source Separation with Kullback-Leibler Divergence Based Probabilistic Loss Function

Masahito Togami, Yoshiki Masuyama, Tatsuya Komatsu, Yu Nakagome

PDF

Open Access

TL;DR

This paper introduces an unsupervised deep learning approach for multi-channel speech source separation that uses a probabilistic loss based on Kullback-Leibler Divergence, enabling effective training without clean signals.

Contribution

It proposes a novel unsupervised training method using a probabilistic loss function with KLD, incorporating a statistical SCM model for robustness against reverberation and noise.

Findings

01

Effective training with small datasets (1K utterances)

02

Robust separation in reverberant environments

03

Probabilistic training avoids overfitting to separation errors

Abstract

In this paper, we propose a multi-channel speech source separation with a deep neural network (DNN) which is trained under the condition that no clean signal is available. As an alternative to a clean signal, the proposed method adopts an estimated speech signal by an unsupervised speech source separation with a statistical model. As a statistical model of microphone input signal, we adopts a time-varying spatial covariance matrix (SCM) model which includes reverberation and background noise submodels so as to achieve robustness against reverberation and background noise. The DNN infers intermediate variables which are needed for constructing the time-varying SCM. Speech source separation is performed in a probabilistic manner so as to avoid overfitting to separation error. Since there are multiple intermediate variables, a loss function which evaluates a single intermediate variable is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Blind Source Separation Techniques · Music and Audio Processing