Dr-Vectors: Decision Residual Networks and an Improved Loss for Speaker   Recognition

Jason Pelecanos; Quan Wang; Ignacio Lopez Moreno

arXiv:2104.01989·cs.CL·March 14, 2022

Dr-Vectors: Decision Residual Networks and an Improved Loss for Speaker Recognition

Jason Pelecanos, Quan Wang, Ignacio Lopez Moreno

PDF

1 Repo

TL;DR

This paper introduces Decision Residual Networks and an improved loss function for speaker recognition, enhancing the modeling of uncertainty and non-linear relationships, leading to significant performance improvements.

Contribution

It proposes a novel decision residual network architecture and a modified loss function to better capture uncertainty and improve speaker recognition accuracy.

Findings

01

Significant performance gains with the proposed methods.

02

Effective modeling of utterance-specific uncertainty.

03

Enhanced separation of same/different speaker scores.

Abstract

Many neural network speaker recognition systems model each speaker using a fixed-dimensional embedding vector. These embeddings are generally compared using either linear or 2nd-order scoring and, until recently, do not handle utterance-specific uncertainty. In this work we propose scoring these representations in a way that can capture uncertainty, enroll/test asymmetry and additional non-linear information. This is achieved by incorporating a 2nd-stage neural network (known as a decision network) as part of an end-to-end training regimen. In particular, we propose the concept of decision residual networks which involves the use of a compact decision network to leverage cosine scores and to model the residual signal that's needed. Additionally, we present a modification to the generalized end-to-end softmax loss function to target the separation of same/different speaker scores. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google/speaker-id/tree/master/lingvo
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax