THUEE system description for NIST 2020 SRE CTS challenge
Yu Zheng, Jinghan Peng, Miao Zhao, Yufeng Ma, Min Liu, Xinyue Ma,, Tianyu Liang, Tianlong Kong, Liang He, Minqiang Xu

TL;DR
This paper describes the THUEE system for the NIST 2020 SRE CTS challenge, utilizing multiple deep learning architectures and advanced loss functions, achieving top performance through system fusion.
Contribution
The paper introduces a multi-architecture speaker recognition system with a novel training strategy and loss function, leading to state-of-the-art results in the challenge.
Findings
Achieved 1st place in NIST 2020 SRE CTS challenge
Developed effective speaker embedding extractors with ResNet and RepVGG architectures
Implemented a two-stage training process with CM-Softmax loss
Abstract
This paper presents the system description of the THUEE team for the NIST 2020 Speaker Recognition Evaluation (SRE) conversational telephone speech (CTS) challenge. The subsystems including ResNet74, ResNet152, and RepVGG-B2 are developed as speaker embedding extractors in this evaluation. We used combined AM-Softmax and AAM-Softmax based loss functions, namely CM-Softmax. We adopted a two-staged training strategy to further improve system performance. We fused all individual systems as our final submission. Our approach leads to excellent performance and ranks 1st in the challenge.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
