Data Generation Using Pass-phrase-dependent Deep Auto-encoders for   Text-Dependent Speaker Verification

Achintya Kumar Sarkar; Md Sahidullah; Zheng-Hua Tan

arXiv:2102.02074·cs.SD·February 4, 2021

Data Generation Using Pass-phrase-dependent Deep Auto-encoders for Text-Dependent Speaker Verification

Achintya Kumar Sarkar, Md Sahidullah, Zheng-Hua Tan

PDF

Open Access

TL;DR

This paper introduces a pass-phrase-dependent auto-encoder approach that generates augmented data for text-dependent speaker verification, improving system performance by transforming features into pass-phrase specific spaces.

Contribution

The novel method trains pass-phrase specific deep auto-encoders to generate augmented data, enhancing speaker verification accuracy over traditional approaches.

Findings

01

Improved verification performance on RedDots dataset.

02

Effective with both cepstral and deep bottleneck features.

03

Enhances GMM-UBM and i-vector based systems.

Abstract

In this paper, we propose a novel method that trains pass-phrase specific deep neural network (PP-DNN) based auto-encoders for creating augmented data for text-dependent speaker verification (TD-SV). Each PP-DNN auto-encoder is trained using the utterances of a particular pass-phrase available in the target enrollment set with two methods: (i) transfer learning and (ii) training from scratch. Next, feature vectors of a given utterance are fed to the PP-DNNs and the output from each PP-DNN at frame-level is considered one new set of generated data. The generated data from each PP-DNN is then used for building a TD-SV system in contrast to the conventional method that considers only the evaluation data available. The proposed approach can be considered as the transformation of data to the pass-phrase specific space using a non-linear transformation learned by each PP-DNN. The method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing