FastMVAE2: On improving and accelerating the fast variational autoencoder-based source separation algorithm for determined mixtures
Li Li, Hirokazu Kameoka, Shoji Makino

TL;DR
This paper introduces a new source model and training scheme that enhance the accuracy and speed of the FastMVAE algorithm for multichannel source separation, especially in unseen data scenarios.
Contribution
The paper proposes the ChimeraACVAE model and a knowledge distillation training scheme to improve generalization and efficiency of FastMVAE.
Findings
Achieved better source separation performance with less computation time.
Successfully separated 18 sources with good accuracy.
Enhanced generalization capability of the source model.
Abstract
This paper proposes a new source model and training scheme to improve the accuracy and speed of the multichannel variational autoencoder (MVAE) method. The MVAE method is a recently proposed powerful multichannel source separation method. It consists of pretraining a source model represented by a conditional VAE (CVAE) and then estimating separation matrices along with other unknown parameters so that the log-likelihood is non-decreasing given an observed mixture signal. Although the MVAE method has been shown to provide high source separation performance, one drawback is the computational cost of the backpropagation steps in the separation-matrix estimation algorithm. To overcome this drawback, a method called "FastMVAE" was subsequently proposed, which uses an auxiliary classifier VAE (ACVAE) to train the source model. By using the classifier and encoder trained in this way, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis
