Target Speech Extraction Based on Blind Source Separation and   X-vector-based Speaker Selection Trained with Data Augmentation

Zhaoyi Gu; Lele Liao; Kai Chen; Jing Lu

arXiv:2005.07976·eess.AS·November 2, 2020·1 cites

Target Speech Extraction Based on Blind Source Separation and X-vector-based Speaker Selection Trained with Data Augmentation

Zhaoyi Gu, Lele Liao, Kai Chen, Jing Lu

PDF

Open Access 1 Repo

TL;DR

This paper proposes a sequential target speech extraction method combining blind source separation and x-vector speaker recognition, enhanced by data augmentation, to improve generalization and extraction accuracy in varied acoustic environments.

Contribution

It introduces a novel combination of BSS methods with an x-vector SR module trained with data augmentation for better target speech extraction.

Findings

01

MVAE generalizes better to unseen speakers with augmented training.

02

The cascaded approach improves extraction accuracy in real-room environments.

03

Data augmentation enhances speaker recognition performance.

Abstract

Extracting the desired speech from a mixture is a meaningful and challenging task. The end-to-end DNN-based methods, though attractive, face the problem of generalization. In this paper, we explore a sequential approach for target speech extraction by combining blind source separation (BSS) with the x-vector based speaker recognition (SR) module. Two promising BSS methods based on source independence assumption, independent low-rank matrix analysis (ILRMA) and multi-channel variational autoencoder (MVAE), are utilized and compared. ILRMA employs nonnegative matrix factorization (NMF) to capture spectral structures of source signals and MVAE utilizes the strong modeling power of deep neural networks (DNN). However, the investigation of MVAE has been limited to the training with very few speakers and the speech signals of test speakers are usually included. We extend the training of MVAE…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

annie-gu/MVAEBasedBSE
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBlind Source Separation Techniques · Speech and Audio Processing · Speech Recognition and Synthesis