The 2021 NIST Speaker Recognition Evaluation

Seyed Omid Sadjadi; Craig Greenberg; Elliot Singer; Lisa; Mason; Douglas Reynolds

arXiv:2204.10242·eess.AS·April 22, 2022·1 cites

The 2021 NIST Speaker Recognition Evaluation

Seyed Omid Sadjadi, Craig Greenberg, Elliot Singer, Lisa, Mason, Douglas Reynolds

PDF

Open Access

TL;DR

The 2021 NIST Speaker Recognition Evaluation introduced new multimodal and multilingual challenges, assessed system performance across audio and visual modalities, and demonstrated the effectiveness of neural network architectures and data augmentation techniques.

Contribution

This paper provides a comprehensive overview of the SRE21 evaluation, highlighting new challenges, data, and the performance of various systems in a large-scale multimodal speaker recognition context.

Findings

01

Audio-visual fusion improves performance significantly.

02

Neural network architectures like ResNet enhance recognition accuracy.

03

Complex training techniques contribute to performance gains.

Abstract

The 2021 Speaker Recognition Evaluation (SRE21) was the latest cycle of the ongoing evaluation series conducted by the U.S. National Institute of Standards and Technology (NIST) since 1996. It was the second large-scale multimodal speaker/person recognition evaluation organized by NIST (the first one being SRE19). Similar to SRE19, it featured two core evaluation tracks, namely audio and audio-visual, as well as an optional visual track. In addition to offering fixed and open training conditions, it also introduced new challenges for the community, thanks to a new multimodal (i.e., audio, video, and selfie images) and multilingual (i.e., with multilingual speakers) corpus, termed WeCanTalk, collected outside North America by the Linguistic Data Consortium (LDC). These challenges included: 1) trials (target and non-target) with enrollment and test segments originating from different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing