Overlap-aware low-latency online speaker diarization based on end-to-end   local segmentation

Juan M. Coria; Herv\'e Bredin; Sahar Ghannay; Sophie Rosset

arXiv:2109.06483·eess.AS·September 15, 2021

Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation

Juan M. Coria, Herv\'e Bredin, Sahar Ghannay, Sophie Rosset

PDF

Open Access 1 Repo

TL;DR

This paper introduces an online speaker diarization method that combines incremental clustering with local segmentation, leveraging overlap-aware segmentation and adjustable latency to improve real-time speaker separation.

Contribution

It presents a novel online diarization pipeline that integrates overlap-aware segmentation with modified statistics pooling and cannot-link constraints, enabling low-latency and improved accuracy.

Findings

01

Effective overlap-aware segmentation improves diarization accuracy.

02

Latency can be tuned between 500ms and 5s with systematic performance analysis.

03

Method outperforms baseline approaches on AMI, DIHARD, and VoxConverse datasets.

Abstract

We propose to address online speaker diarization as a combination of incremental clustering and local diarization applied to a rolling buffer updated every 500ms. Every single step of the proposed pipeline is designed to take full advantage of the strong ability of a recently proposed end-to-end overlap-aware segmentation to detect and separate overlapping speakers. In particular, we propose a modified version of the statistics pooling layer (initially introduced in the x-vector architecture) to give less weight to frames where the segmentation model predicts simultaneous speakers. Furthermore, we derive cannot-link constraints from the initial segmentation step to prevent two local speakers from being wrongfully merged during the incremental clustering step. Finally, we show how the latency of the proposed approach can be adjusted between 500ms and 5s to match the requirements of a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

juanmc2005/streamingspeakerdiarization
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Video Analysis and Summarization