Deep Bayesian Unsupervised Source Separation Based on a Complex Gaussian   Mixture Model

Yoshiaki Bando; Yoko Sasaki; Kazuyoshi Yoshii

arXiv:1908.11307·cs.SD·August 30, 2019

Deep Bayesian Unsupervised Source Separation Based on a Complex Gaussian Mixture Model

Yoshiaki Bando, Yoko Sasaki, Kazuyoshi Yoshii

PDF

Open Access

TL;DR

This paper introduces an unsupervised neural source separation method using a complex Gaussian mixture model that jointly trains separation and localization networks, improving performance without extensive supervised data.

Contribution

It proposes a deep Bayesian framework that jointly trains separation and localization networks using a complex Gaussian mixture model, addressing frequency permutation ambiguity.

Findings

01

Outperformed conventional initialization methods in simulated speech mixtures

02

Effectively estimates spatial variables without supervised training data

03

Enhances multichannel source separation performance

Abstract

This paper presents an unsupervised method that trains neural source separation by using only multichannel mixture signals. Conventional neural separation methods require a lot of supervised data to achieve excellent performance. Although multichannel methods based on spatial information can work without such training data, they are often sensitive to parameter initialization and degraded with the sources located close to each other. The proposed method uses a cost function based on a spatial model called a complex Gaussian mixture model (cGMM). This model has the time-frequency (TF) masks and direction of arrivals (DoAs) of sources as latent variables and is used for training separation and localization networks that respectively estimate these variables. This joint training solves the frequency permutation ambiguity of the spatial model in a unified deep Bayesian framework. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis