The Third DIHARD Diarization Challenge

Neville Ryant; Prachi Singh; Venkat Krishnamohan; Rajat Varma; and Kenneth Church; Christopher Cieri; Jun Du; Sriram Ganapathy and; Mark Liberman

arXiv:2012.01477·eess.AS·April 6, 2021

The Third DIHARD Diarization Challenge

Neville Ryant, Prachi Singh, Venkat Krishnamohan, Rajat Varma, and Kenneth Church, Christopher Cieri, Jun Du, Sriram Ganapathy and, Mark Liberman

PDF

3 Repos 10 Models

TL;DR

DIHARD III is a comprehensive challenge that assesses speaker diarization systems across diverse, real-world audio conditions, highlighting significant progress yet persistent challenges in the field.

Contribution

This paper presents the third DIHARD diarization challenge, introducing evaluation across new domains like conversational telephone speech and providing a large-scale benchmark for system robustness.

Findings

01

Marked improvement in diarization accuracy since DIHARD I.

02

Significant progress for two-party interactions.

03

Challenges remain in domains like web videos.

Abstract

DIHARD III was the third in a series of speaker diarization challenges intended to improve the robustness of diarization systems to variability in recording equipment, noise conditions, and conversational domain. Speaker diarization was evaluated under two speech activity conditions (diarization from a reference speech activity vs. diarization from scratch) and 11 diverse domains. The domains span a range of recording conditions and interaction types, including read audio-books, meeting speech, clinical interviews, web videos, and, for the first time, conversational telephone speech. A total of 30 organizations (forming 21teams) from industry and academia submitted 499 valid system outputs. The evaluation results indicate that speaker diarization has improved markedly since DIHARD I, particularly for two-party interactions, but that for many domains (e.g., web video) the problem remains…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.