The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural   Diarization and X-Vector Clustering Systems Combined by DOVER-Lap

Shota Horiguchi; Nelson Yalta; Paola Garcia; Yuki Takashima; Yawen; Xue; Desh Raj; Zili Huang; Yusuke Fujita; Shinji Watanabe; Sanjeev Khudanpur

arXiv:2102.01363·eess.AS·February 3, 2021·27 cites

The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap

Shota Horiguchi, Nelson Yalta, Paola Garcia, Yuki Takashima, Yawen, Xue, Desh Raj, Zili Huang, Yusuke Fujita, Shinji Watanabe, Sanjeev Khudanpur

PDF

Open Access

TL;DR

This paper describes the Hitachi-JHU system for the DIHARD III challenge, combining five subsystems through DOVER-Lap to achieve competitive diarization error rates and secure second place in all tasks.

Contribution

The paper introduces an ensemble of five diverse diarization subsystems refined for optimal performance, combined using DOVER-Lap for improved accuracy.

Findings

01

Achieved diarization error rates of 11.58% and 14.09% in Track 1 full and core.

02

Achieved diarization error rates of 16.94% and 20.01% in Track 2 full and core.

03

Secured second place in all challenge tasks.

Abstract

This paper provides a detailed description of the Hitachi-JHU system that was submitted to the Third DIHARD Speech Diarization Challenge. The system outputs the ensemble results of the five subsystems: two x-vector-based subsystems, two end-to-end neural diarization-based subsystems, and one hybrid subsystem. We refine each system and all five subsystems become competitive and complementary. After the DOVER-Lap based system combination, it achieved diarization error rates of 11.58 % and 14.09 % in Track 1 full and core, and 16.94 % and 20.01 % in Track 2 full and core, respectively. With their results, we won second place in all the tasks of the challenge.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing