LEAP Submission for the Third DIHARD Diarization Challenge

Prachi Singh; Rajat Varma; Venkat Krishnamohan; Srikanth Raj; Chetupalli; Sriram Ganapathy

arXiv:2104.02359·eess.AS·June 15, 2021·Interspeech

LEAP Submission for the Third DIHARD Diarization Challenge

Prachi Singh, Rajat Varma, Venkat Krishnamohan, Srikanth Raj, Chetupalli, Sriram Ganapathy

PDF

TL;DR

This paper presents LEAP's system for the DIHARD-III challenge, combining bandwidth classification and tailored diarization methods, achieving significant improvements over the baseline in speaker diarization accuracy.

Contribution

The paper introduces a hybrid diarization system with specialized models for narrowband and wideband speech, and demonstrates notable performance gains on the DIHARD-III dataset.

Findings

01

24% and 18% relative improvements over baseline

02

Effective use of bandwidth classification for diarization

03

Post-evaluation analysis led to system enhancements

Abstract

The LEAP submission for DIHARD-III challenge is described in this paper. The proposed system is composed of a speech bandwidth classifier, and diarization systems fine-tuned for narrowband and wideband speech separately. We use an end-to-end speaker diarization system for the narrowband conversational telephone speech recordings. For the wideband multi-speaker recordings, we use a neural embedding based clustering approach, similar to the baseline system. The embeddings are extracted from a time-delay neural network (called x-vectors) followed by the graph based path integral clustering (PIC) approach. The LEAP system showed 24% and 18% relative improvements for Track-1 and Track-2 respectively over the baseline system provided by the organizers. This paper describes the challenge submission, the post-evaluation analysis and improvements observed on the DIHARD-III dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.