The DKU-DukeECE Systems for VoxCeleb Speaker Recognition Challenge 2020
Weiqing Wang, Danwei Cai, Xiaoyi Qin, Ming Li

TL;DR
This paper details the DKU-DukeECE team's system submissions for the VoxCeleb Speaker Recognition Challenge 2020, exploring advanced front-end extractors, self-supervised learning, and speaker diarization pipelines.
Contribution
It introduces novel combinations of state-of-the-art front-end extractors, a self-supervised learning framework, and a comprehensive speaker diarization system for the challenge.
Findings
Improved speaker recognition accuracy with advanced front-end extractors.
Effective self-supervised learning framework for speaker representation.
Robust speaker diarization pipeline with high clustering accuracy.
Abstract
In this paper, we present the system submission for the VoxCeleb Speaker Recognition Challenge 2020 (VoxSRC-20) by the DKU-DukeECE team. For track 1, we explore various kinds of state-of-the-art front-end extractors with different pooling layers and objective loss functions. For track 3, we employ an iterative framework for self-supervised speaker representation learning based on a deep neural network (DNN). For track 4, we investigate the whole system pipeline for speaker diarization, including voice activity detection (VAD), uniform segmentation, speaker embedding extraction, and clustering.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
