The NIST CTS Speaker Recognition Challenge

Seyed Omid Sadjadi; Craig Greenberg; Elliot Singer; Lisa Mason,; Douglas Reynolds

arXiv:2204.10228·eess.AS·April 22, 2022

The NIST CTS Speaker Recognition Challenge

Seyed Omid Sadjadi, Craig Greenberg, Elliot Singer, Lisa Mason,, Douglas Reynolds

PDF

Open Access

TL;DR

The NIST CTS Speaker Recognition Challenge evaluates speaker recognition systems using telephony data, highlighting recent advances in neural network architectures, data augmentation, and fine-tuning that have significantly improved performance.

Contribution

This paper provides an overview of the CTS Challenge, including system performance analyses and the impact of recent methodological improvements in speaker recognition.

Findings

01

Significant performance improvements with ResNet-based embeddings.

02

Impact of extensive data augmentation on system accuracy.

03

Benefits of long-duration fine-tuning for speaker recognition.

Abstract

The US National Institute of Standards and Technology (NIST) has been conducting a second iteration of the CTS challenge since August 2020. The current iteration of the CTS Challenge is a leaderboard-style speaker recognition evaluation using telephony data extracted from the unexposed portions of the Call My Net 2 (CMN2) and Multi-Language Speech (MLS) corpora collected by the LDC. The CTS Challenge is currently organized in a similar manner to the SRE19 CTS Challenge, offering only an open training condition using two evaluation subsets, namely Progress and Test. Unlike in the SRE19 Challenge, no training or development set was initially released, and NIST has publicly released the leaderboards on both subsets for the CTS Challenge. Which subset (i.e., Progress or Test) a trial belongs to is unknown to challenge participants, and each system submission needs to contain outputs for all…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing