USTC-NELSLIP System Description for DIHARD-III Challenge

Yuxuan Wang; Maokui He; Shutong Niu; Lei Sun; Tian Gao; Xin Fang; Jia; Pan; Jun Du; Chin-Hui Lee

arXiv:2103.10661·cs.SD·March 22, 2021·20 cites

USTC-NELSLIP System Description for DIHARD-III Challenge

Yuxuan Wang, Maokui He, Shutong Niu, Lei Sun, Tian Gao, Xin Fang, Jia, Pan, Jun Du, Chin-Hui Lee

PDF

Open Access

TL;DR

This paper details a speech diarization system for the DIHARD-III challenge that combines multiple front-end techniques, including speech separation, TS-VAD, and domain-dependent processing, achieving competitive DERs.

Contribution

The system introduces a novel combination of front-end techniques and domain adaptation methods for improved speech diarization performance.

Findings

01

Achieved DER of 11.30% in track 1

02

Achieved DER of 16.78% in track 2

03

Demonstrated effectiveness of combined front-end techniques

Abstract

This system description describes our submission system to the Third DIHARD Speech Diarization Challenge. Besides the traditional clustering based system, the innovation of our system lies in the combination of various front-end techniques to solve the diarization problem, including speech separation and target-speaker based voice activity detection (TS-VAD), combined with iterative data purification. We also adopted audio domain classification to design domain-dependent processing. Finally, we performed post processing to do system fusion and selection. Our best system achieved DERs of 11.30% in track 1 and 16.78% in track 2 on evaluation set, respectively.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing