Exploring Voice Conversion based Data Augmentation in Text-Dependent   Speaker Verification

Xiaoyi Qin; Yaogen Yang; Lin Yang; Xuyang Wang; Junjie; Wang; Ming Li

arXiv:2011.10710·cs.SD·November 24, 2020

Exploring Voice Conversion based Data Augmentation in Text-Dependent Speaker Verification

Xiaoyi Qin, Yaogen Yang, Lin Yang, Xuyang Wang, Junjie, Wang, Ming Li

PDF

Open Access

TL;DR

This paper investigates voice conversion techniques for data augmentation to enhance text-dependent speaker verification, demonstrating significant performance improvements with limited training data.

Contribution

It introduces the use of voice conversion methods for data augmentation in speaker verification, showing their effectiveness over simple re-sampling.

Findings

01

Equal Error Rate reduced from 6.51% to 4.51%.

02

Voice conversion-based augmentation improves verification accuracy.

03

Simple re-sampling is less effective than voice conversion methods.

Abstract

In this paper, we focus on improving the performance of the text-dependent speaker verification system in the scenario of limited training data. The speaker verification system deep learning based text-dependent generally needs a large scale text-dependent training data set which could be labor and cost expensive, especially for customized new wake-up words. In recent studies, voice conversion systems that can generate high quality synthesized speech of seen and unseen speakers have been proposed. Inspired by those works, we adopt two different voice conversion methods as well as the very simple re-sampling approach to generate new text-dependent speech samples for data augmentation purposes. Experimental results show that the proposed method significantly improves the Equal Error Rare performance from 6.51% to 4.51% in the scenario of limited training data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing