On the effectiveness of enrollment speech augmentation for Target   Speaker Extraction

Junjie Li; Ke Zhang; Shuai Wang; Haizhou Li; Man-Wai Mak; Kong Aik Lee

arXiv:2409.09589·cs.SD·September 17, 2024

On the effectiveness of enrollment speech augmentation for Target Speaker Extraction

Junjie Li, Ke Zhang, Shuai Wang, Haizhou Li, Man-Wai Mak, Kong Aik Lee

PDF

Open Access

TL;DR

This paper investigates the impact of augmenting enrollment speech in target speaker extraction, introducing a novel SSA method that improves performance, especially with limited training data.

Contribution

It is the first to thoroughly analyze enrollment speech augmentation effects and proposes SSA, a new augmentation technique for TSE tasks.

Findings

01

Augmenting enrollment speech improves TSE robustness.

02

Proposed SSA method enhances performance by up to 2.5 dB.

03

Effective with both pretrained and jointly optimized encoders.

Abstract

Deep learning technologies have significantly advanced the performance of target speaker extraction (TSE) tasks. To enhance the generalization and robustness of these algorithms when training data is insufficient, data augmentation is a commonly adopted technique. Unlike typical data augmentation applied to speech mixtures, this work thoroughly investigates the effectiveness of augmenting the enrollment speech space. We found that for both pretrained and jointly optimized speaker encoders, directly augmenting the enrollment speech leads to consistent performance improvement. In addition to conventional methods such as noise and reverberation addition, we propose a novel augmentation method called self-estimated speech augmentation (SSA). Experimental results on the Libri2Mix test set show that our proposed method can achieve an improvement of up to 2.5 dB.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Advanced Data Compression Techniques