A Unified Deep Neural Network for Speaker and Language Recognition
Fred Richardson, Douglas Reynolds, Najim Dehak

TL;DR
This paper introduces a unified deep neural network that simultaneously improves speaker and language recognition performance, achieving significant error rate reductions on major benchmark tasks.
Contribution
It presents a novel single DNN approach that enhances both speaker and language recognition, outperforming separate models.
Findings
55% reduction in EER for speaker recognition in out-of-domain conditions
48% reduction in EER for language recognition on NIST 2011 test
Substantial performance improvements over previous methods
Abstract
Learned feature representations and sub-phoneme posteriors from Deep Neural Networks (DNNs) have been used separately to produce significant performance gains for speaker and language recognition tasks. In this work we show how these gains are possible using a single DNN for both speaker and language recognition. The unified DNN approach is shown to yield substantial performance improvements on the the 2013 Domain Adaptation Challenge speaker recognition task (55% reduction in EER for the out-of-domain condition) and on the NIST 2011 Language Recognition Evaluation (48% reduction in EER for the 30s test condition).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
