Bi-LSTM Scoring Based Similarity Measurement with Agglomerative   Hierarchical Clustering (AHC) for Speaker Diarization

Siddharth S. Nijhawan; Homayoon Beigi

arXiv:2205.09709·eess.AS·May 20, 2022

Bi-LSTM Scoring Based Similarity Measurement with Agglomerative Hierarchical Clustering (AHC) for Speaker Diarization

Siddharth S. Nijhawan, Homayoon Beigi

PDF

TL;DR

This paper introduces a Bi-LSTM based similarity measurement combined with AHC clustering for speaker diarization, significantly reducing diarization error rate by capturing temporal speech dynamics.

Contribution

The novel integration of Bi-LSTM for similarity estimation with AHC clustering improves speaker diarization accuracy over traditional methods.

Findings

01

Achieved a DER of 34.80% on ICSI Meeting Corpus.

02

Outperformed traditional PLDA-based similarity measurement.

03

Demonstrated effectiveness in handling overlapping speech segments.

Abstract

Majority of speech signals across different scenarios are never available with well-defined audio segments containing only a single speaker. A typical conversation between two speakers consists of segments where their voices overlap, interrupt each other or halt their speech in between multiple sentences. Recent advancements in diarization technology leverage neural network-based approaches to improvise multiple subsystems of speaker diarization system comprising of extracting segment-wise embedding features and detecting changes in the speaker during conversation. However, to identify speaker through clustering, models depend on methodologies like PLDA to generate similarity measure between two extracted segments from a given conversational audio. Since these algorithms ignore the temporal structure of conversations, they tend to achieve a higher Diarization Error Rate (DER), thus…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMemory Network