Time-Contrastive Learning Based DNN Bottleneck Features for   Text-Dependent Speaker Verification

Achintya Kr. Sarkar; Zheng-Hua Tan

arXiv:1704.02373·cs.SD·May 14, 2019·2 cites

Time-Contrastive Learning Based DNN Bottleneck Features for Text-Dependent Speaker Verification

Achintya Kr. Sarkar, Zheng-Hua Tan

PDF

Open Access

TL;DR

This paper introduces a novel time-contrastive learning approach for DNN bottleneck feature extraction that leverages temporal structure in speech, improving text-dependent speaker verification performance.

Contribution

It proposes a TCL-based BN feature extraction method that learns generic features from unlabeled temporal segments, outperforming traditional speaker and pass-phrase discriminant features.

Findings

01

TCL-BN features outperform existing BN and MFCC features in speaker verification.

02

The method effectively captures temporal structure for robust feature learning.

03

Experimental results on RedDots Challenge 2016 validate the approach's superiority.

Abstract

In this paper, we present a time-contrastive learning (TCL) based bottleneck (BN)feature extraction method for speech signals with an application to text-dependent (TD) speaker verification (SV). It is well-known that speech signals exhibit quasi-stationary behavior in and only in a short interval, and the TCL method aims to exploit this temporal structure. More specifically, it trains deep neural networks (DNNs) to discriminate temporal events obtained by uniformly segmenting speech signals, in contrast to existing DNN based BN feature extraction methods that train DNNs using labeled data to discriminate speakers or pass-phrases or phones or a combination of them. In the context of speaker verification, speech data of fixed pass-phrases are used for TCL-BN training, while the pass-phrases used for TCL-BN training are excluded from being used for SV, so that the learned features can be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing