Text-Independent Speaker Verification Based on Deep Neural Networks and   Segmental Dynamic Time Warping

Mohamed Adel; Mohamed Afify; Akram Gaballah

arXiv:1806.09932·cs.SD·June 27, 2018·1 cites

Text-Independent Speaker Verification Based on Deep Neural Networks and Segmental Dynamic Time Warping

Mohamed Adel, Mohamed Afify, Akram Gaballah

PDF

Open Access

TL;DR

This paper introduces a novel text-independent speaker verification method combining deep neural network-derived d-vectors with segmental dynamic time warping, outperforming traditional i-vector and d-vector approaches on NIST 2008 data.

Contribution

The paper proposes integrating segmental dynamic time warping with d-vectors for improved speaker verification accuracy, demonstrating superior performance over existing methods.

Findings

01

Outperforms i-vector baseline with PLDA scores

02

Surpasses d-vector approach with local cosine and PLDA distances

03

Score fusion yields significant accuracy improvements

Abstract

In this paper we present a new method for text-independent speaker verification that combines segmental dynamic time warping (SDTW) and the d-vector approach. The d-vectors, generated from a feed forward deep neural network trained to distinguish between speakers, are used as features to perform alignment and hence calculate the overall distance between the enrolment and test utterances.We present results on the NIST 2008 data set for speaker verification where the proposed method outperforms the conventional i-vector baseline with PLDA scores and outperforms d-vector approach with local distances based on cosine and PLDA scores. Also score combination with the i-vector/PLDA baseline leads to significant gains over both methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Time Series Analysis and Forecasting