Improving Speaker Identification for Shared Devices by Adapting   Embeddings to Speaker Subsets

Zhenning Tan; Yuguang Yang; Eunjung Han; Andreas Stolcke

arXiv:2109.02576·eess.AS·February 22, 2022

Improving Speaker Identification for Shared Devices by Adapting Embeddings to Speaker Subsets

Zhenning Tan, Yuguang Yang, Eunjung Han, Andreas Stolcke

PDF

TL;DR

This paper introduces a household-adapted nonlinear embedding method that significantly improves speaker identification accuracy for shared devices by creating more distinct speaker clusters within households.

Contribution

The paper proposes a novel household-adapted nonlinear mapping to enhance speaker embeddings for better discrimination among household members sharing devices.

Findings

01

EER reduced by 45-71% in simulated households

02

EER reduced by 49.2% on real-world data

03

Household-adapted embeddings form more compact clusters

Abstract

Speaker identification typically involves three stages. First, a front-end speaker embedding model is trained to embed utterance and speaker profiles. Second, a scoring function is applied between a runtime utterance and each speaker profile. Finally, the speaker is identified using nearest neighbor according to the scoring metric. To better distinguish speakers sharing a device within the same household, we propose a household-adapted nonlinear mapping to a low dimensional space to complement the global scoring metric. The combined scoring function is optimized on labeled or pseudo-labeled speaker utterances. With input dropout, the proposed scoring model reduces EER by 45-71% in simulated households with 2 to 7 hard-to-discriminate speakers per household. On real-world internal data, the EER reduction is 49.2%. From t-SNE visualization, we also show that clusters formed by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.