The DKU-SMIIP System for NIST 2018 Speaker Recognition Evaluation
Danwei Cai, Weicheng Cai, Ming Li

TL;DR
This paper describes the DKU-SMIIP system for NIST 2018 speaker recognition, utilizing multiple advanced front-end extractors and back-end modeling techniques to improve verification accuracy across varied conditions.
Contribution
The paper introduces a multi-faceted speaker recognition system combining various front-end extractors and back-end adaptations, with extended experiments on deep ResNet architectures and loss functions.
Findings
Detection costs of 0.392 on CMN2 and 0.494 on VAST datasets
Effective use of multiple front-end extractors and domain adaptation techniques
Enhanced deep ResNet performance through layer and loss function optimization
Abstract
In this paper, we present the system submission for the NIST 2018 Speaker Recognition Evaluation by DKU Speech and Multi-Modal Intelligent Information Processing (SMIIP) Lab. We explore various kinds of state-of-the-art front-end extractors as well as back-end modeling for text-independent speaker verifications. Our submitted primary systems employ multiple state-of-the-art front-end extractors, including the MFCC i-vector, the DNN tandem i-vector, the TDNN x-vector, and the deep ResNet. After speaker embedding is extracted, we exploit several kinds of back-end modeling to perform variability compensation and domain adaptation for mismatch training and testing conditions. The final submitted system on the fixed condition obtains actual detection cost of 0.392 and 0.494 on CMN2 and VAST evaluation data respectively. After the official evaluation, we further extend our experiments by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
