Data Generation Using Pass-phrase-dependent Deep Auto-encoders for Text-Dependent Speaker Verification
Achintya Kumar Sarkar, Md Sahidullah, Zheng-Hua Tan

TL;DR
This paper introduces a pass-phrase-dependent auto-encoder approach that generates augmented data for text-dependent speaker verification, improving system performance by transforming features into pass-phrase specific spaces.
Contribution
The novel method trains pass-phrase specific deep auto-encoders to generate augmented data, enhancing speaker verification accuracy over traditional approaches.
Findings
Improved verification performance on RedDots dataset.
Effective with both cepstral and deep bottleneck features.
Enhances GMM-UBM and i-vector based systems.
Abstract
In this paper, we propose a novel method that trains pass-phrase specific deep neural network (PP-DNN) based auto-encoders for creating augmented data for text-dependent speaker verification (TD-SV). Each PP-DNN auto-encoder is trained using the utterances of a particular pass-phrase available in the target enrollment set with two methods: (i) transfer learning and (ii) training from scratch. Next, feature vectors of a given utterance are fed to the PP-DNNs and the output from each PP-DNN at frame-level is considered one new set of generated data. The generated data from each PP-DNN is then used for building a TD-SV system in contrast to the conventional method that considers only the evaluation data available. The proposed approach can be considered as the transformation of data to the pass-phrase specific space using a non-linear transformation learned by each PP-DNN. The method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
