Integration of variational autoencoder and spatial clustering for   adaptive multi-channel neural speech separation

Katerina Zmolikova; Marc Delcroix; Luk\'a\v{s} Burget; Tomohiro; Nakatani; Jan "Honza" \v{C}ernock\'y

arXiv:2011.11984·eess.AS·November 25, 2020·SLT

Integration of variational autoencoder and spatial clustering for adaptive multi-channel neural speech separation

Katerina Zmolikova, Marc Delcroix, Luk\'a\v{s} Burget, Tomohiro, Nakatani, Jan "Honza" \v{C}ernock\'y

PDF

1 Repo

TL;DR

This paper introduces a novel multi-channel speech separation method that combines variational autoencoders with spatial clustering, outperforming previous models and allowing easier adaptation to new noise environments.

Contribution

It presents a new factorial model based on a generative neural network (VAE) that integrates spectral and spatial information for improved speech separation.

Findings

01

Outperforms previous factorial GMM models (DOLPHIN)

02

Performs comparably to permutation invariant training with spatial clustering

03

Eases adaptation to new noise conditions

Abstract

In this paper, we propose a method combining variational autoencoder model of speech with a spatial clustering approach for multi-channel speech separation. The advantage of integrating spatial clustering with a spectral model was shown in several works. As the spectral model, previous works used either factorial generative models of the mixed speech or discriminative neural networks. In our work, we combine the strengths of both approaches, by building a factorial model based on a generative neural network, a variational autoencoder. By doing so, we can exploit the modeling power of neural networks, but at the same time, keep a structured model. Such a model can be advantageous when adapting to new noise conditions as only the noise part of the model needs to be modified. We show experimentally, that our model significantly outperforms previous factorial model based on Gaussian mixture…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

BUTSpeechFIT/vae_dolphin
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.