FastMVAE2: On improving and accelerating the fast variational   autoencoder-based source separation algorithm for determined mixtures

Li Li; Hirokazu Kameoka; Shoji Makino

arXiv:2109.13496·cs.SD·September 8, 2022

FastMVAE2: On improving and accelerating the fast variational autoencoder-based source separation algorithm for determined mixtures

Li Li, Hirokazu Kameoka, Shoji Makino

PDF

Open Access

TL;DR

This paper introduces a new source model and training scheme that enhance the accuracy and speed of the FastMVAE algorithm for multichannel source separation, especially in unseen data scenarios.

Contribution

The paper proposes the ChimeraACVAE model and a knowledge distillation training scheme to improve generalization and efficiency of FastMVAE.

Findings

01

Achieved better source separation performance with less computation time.

02

Successfully separated 18 sources with good accuracy.

03

Enhanced generalization capability of the source model.

Abstract

This paper proposes a new source model and training scheme to improve the accuracy and speed of the multichannel variational autoencoder (MVAE) method. The MVAE method is a recently proposed powerful multichannel source separation method. It consists of pretraining a source model represented by a conditional VAE (CVAE) and then estimating separation matrices along with other unknown parameters so that the log-likelihood is non-decreasing given an observed mixture signal. Although the MVAE method has been shown to provide high source separation performance, one drawback is the computational cost of the backpropagation steps in the separation-matrix estimation algorithm. To overcome this drawback, a method called "FastMVAE" was subsequently proposed, which uses an auxiliary classifier VAE (ACVAE) to train the source model. By using the classifier and encoder trained in this way, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis