A Unified Deep Neural Network for Speaker and Language Recognition

Fred Richardson; Douglas Reynolds; Najim Dehak

arXiv:1504.00923·cs.CL·April 6, 2015

A Unified Deep Neural Network for Speaker and Language Recognition

Fred Richardson, Douglas Reynolds, Najim Dehak

PDF

TL;DR

This paper introduces a unified deep neural network that simultaneously improves speaker and language recognition performance, achieving significant error rate reductions on major benchmark tasks.

Contribution

It presents a novel single DNN approach that enhances both speaker and language recognition, outperforming separate models.

Findings

01

55% reduction in EER for speaker recognition in out-of-domain conditions

02

48% reduction in EER for language recognition on NIST 2011 test

03

Substantial performance improvements over previous methods

Abstract

Learned feature representations and sub-phoneme posteriors from Deep Neural Networks (DNNs) have been used separately to produce significant performance gains for speaker and language recognition tasks. In this work we show how these gains are possible using a single DNN for both speaker and language recognition. The unified DNN approach is shown to yield substantial performance improvements on the the 2013 Domain Adaptation Challenge speaker recognition task (55% reduction in EER for the out-of-domain condition) and on the NIST 2011 Language Recognition Evaluation (48% reduction in EER for the 30s test condition).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.