The DKU-SMIIP System for NIST 2018 Speaker Recognition Evaluation

Danwei Cai; Weicheng Cai; Ming Li

arXiv:1907.02191·eess.AS·July 5, 2019·Interspeech·1 cites

The DKU-SMIIP System for NIST 2018 Speaker Recognition Evaluation

Danwei Cai, Weicheng Cai, Ming Li

PDF

Open Access

TL;DR

This paper describes the DKU-SMIIP system for NIST 2018 speaker recognition, utilizing multiple advanced front-end extractors and back-end modeling techniques to improve verification accuracy across varied conditions.

Contribution

The paper introduces a multi-faceted speaker recognition system combining various front-end extractors and back-end adaptations, with extended experiments on deep ResNet architectures and loss functions.

Findings

01

Detection costs of 0.392 on CMN2 and 0.494 on VAST datasets

02

Effective use of multiple front-end extractors and domain adaptation techniques

03

Enhanced deep ResNet performance through layer and loss function optimization

Abstract

In this paper, we present the system submission for the NIST 2018 Speaker Recognition Evaluation by DKU Speech and Multi-Modal Intelligent Information Processing (SMIIP) Lab. We explore various kinds of state-of-the-art front-end extractors as well as back-end modeling for text-independent speaker verifications. Our submitted primary systems employ multiple state-of-the-art front-end extractors, including the MFCC i-vector, the DNN tandem i-vector, the TDNN x-vector, and the deep ResNet. After speaker embedding is extracted, we exploit several kinds of back-end modeling to perform variability compensation and domain adaptation for mismatch training and testing conditions. The final submitted system on the fixed condition obtains actual detection cost of 0.392 and 0.494 on CMN2 and VAST evaluation data respectively. After the official evaluation, we further extend our experiments by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing