Data augmentation enhanced speaker enrollment for text-dependent speaker   verification

Achintya Kumar Sarkar; Himangshu Sarma; Priyanka Dwivedi; Zheng-Hua; Tan

arXiv:2007.08004·eess.AS·March 29, 2021·1 cites

Data augmentation enhanced speaker enrollment for text-dependent speaker verification

Achintya Kumar Sarkar, Himangshu Sarma, Priyanka Dwivedi, Zheng-Hua, Tan

PDF

Open Access

TL;DR

This paper explores the novel use of data augmentation techniques specifically for speaker enrollment in text-dependent speaker verification, demonstrating improved robustness especially in noisy conditions.

Contribution

It introduces data augmentation methods for speaker enrollment, a less-studied area, and evaluates their effectiveness with two fusion strategies and under noisy conditions.

Findings

01

Data augmentation improves speaker enrollment performance.

02

Fusion of augmented systems enhances verification accuracy.

03

Methods are validated on RedDots 2016 database.

Abstract

Data augmentation is commonly used for generating additional data from the available training data to achieve a robust estimation of the parameters of complex models like the one for speaker verification (SV), especially for under-resourced applications. SV involves training speaker-independent (SI) models and speaker-dependent models where speakers are represented by models derived from an SI model using the training data for the particular speaker during the enrollment phase. While data augmentation for training SI models is well studied, data augmentation for speaker enrollment is rarely explored. In this paper, we propose the use of data augmentation methods for generating extra data to empower speaker enrollment. Each data augmentation method generates a new data set. Two strategies of using the data sets are explored: the first one is to training separate systems and fuses them at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing