The DKU-DukeECE Systems for VoxCeleb Speaker Recognition Challenge 2020

Weiqing Wang; Danwei Cai; Xiaoyi Qin; Ming Li

arXiv:2010.12731·eess.AS·October 27, 2020·22 cites

The DKU-DukeECE Systems for VoxCeleb Speaker Recognition Challenge 2020

Weiqing Wang, Danwei Cai, Xiaoyi Qin, Ming Li

PDF

Open Access

TL;DR

This paper details the DKU-DukeECE team's system submissions for the VoxCeleb Speaker Recognition Challenge 2020, exploring advanced front-end extractors, self-supervised learning, and speaker diarization pipelines.

Contribution

It introduces novel combinations of state-of-the-art front-end extractors, a self-supervised learning framework, and a comprehensive speaker diarization system for the challenge.

Findings

01

Improved speaker recognition accuracy with advanced front-end extractors.

02

Effective self-supervised learning framework for speaker representation.

03

Robust speaker diarization pipeline with high clustering accuracy.

Abstract

In this paper, we present the system submission for the VoxCeleb Speaker Recognition Challenge 2020 (VoxSRC-20) by the DKU-DukeECE team. For track 1, we explore various kinds of state-of-the-art front-end extractors with different pooling layers and objective loss functions. For track 3, we employ an iterative framework for self-supervised speaker representation learning based on a deep neural network (DNN). For track 4, we investigate the whole system pipeline for speaker diarization, including voice activity detection (VAD), uniform segmentation, speaker embedding extraction, and clustering.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing