The DKU-Duke-Lenovo System Description for the Third DIHARD Speech   Diarization Challenge

Weiqing Wang; Qingjian Lin; Danwei Cai; Lin Yang; Ming Li

arXiv:2102.03649·eess.AS·February 9, 2021·1 cites

The DKU-Duke-Lenovo System Description for the Third DIHARD Speech Diarization Challenge

Weiqing Wang, Qingjian Lin, Danwei Cai, Lin Yang, Ming Li

PDF

Open Access

TL;DR

This paper describes a speaker diarization system for the DIHARD challenge, combining multiple modules like VAD, segmentation, and clustering, with improvements from target speaker VAD for phone calls, achieving competitive DER results.

Contribution

The paper introduces a multi-module diarization system with target speaker VAD, enhancing performance on challenging speech datasets for the DIHARD challenge.

Findings

01

Achieved DER of 15.43% on core evaluation set for task 1.

02

Improved diarization performance with target speaker VAD on phone call data.

03

System demonstrated competitive results in the DIHARD challenge.

Abstract

In this paper, we present the submitted system for the third DIHARD Speech Diarization Challenge from the DKU-Duke-Lenovo team. Our system consists of several modules: voice activity detection (VAD), segmentation, speaker embedding extraction, attentive similarity scoring, agglomerative hierarchical clustering. In addition, the target speaker VAD (TSVAD) is used for the phone call data to further improve the performance. Our final submitted system achieves a DER of 15.43% for the core evaluation set and 13.39% for the full evaluation set on task 1, and we also get a DER of 21.63% for core evaluation set and 18.90% for full evaluation set on task 2.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research