Low-Latency Speech Separation Guided Diarization for Telephone   Conversations

Giovanni Morrone; Samuele Cornell; Desh Raj; Luca Serafini; Enrico; Zovato; Alessio Brutti; Stefano Squartini

arXiv:2204.02306·eess.AS·October 28, 2022·SLT·1 cites

Low-Latency Speech Separation Guided Diarization for Telephone Conversations

Giovanni Morrone, Samuele Cornell, Desh Raj, Luca Serafini, Enrico, Zovato, Alessio Brutti, Stefano Squartini

PDF

Open Access 1 Repo

TL;DR

This paper evaluates low-latency speech separation guided diarization (SSGD) for telephone conversations, demonstrating competitive diarization error rates and speech recognition performance with less data and lower latency than state-of-the-art methods.

Contribution

It introduces a low-latency online SSGD model with a novel post-processing algorithm, achieving high diarization accuracy and effective speech recognition integration.

Findings

01

DPRNN-based online SSGD achieves 11.1% DER on CALLHOME

02

Post-processing reduces false alarms significantly

03

Separated signals enable near-oracle speech recognition performance

Abstract

In this paper, we carry out an analysis on the use of speech separation guided diarization (SSGD) in telephone conversations. SSGD performs diarization by separating the speakers signals and then applying voice activity detection on each estimated speaker signal. In particular, we compare two low-latency speech separation models. Moreover, we show a post-processing algorithm that significantly reduces the false alarm errors of a SSGD pipeline. We perform our experiments on two datasets: Fisher Corpus Part 1 and CALLHOME, evaluating both separation and diarization metrics. Notably, our SSGD DPRNN-based online model achieves 11.1% DER on CALLHOME, comparable with most state-of-the-art end-to-end neural diarization models despite being trained on an order of magnitude less data and having considerably lower latency, i.e., 0.1 vs. 10 seconds. We also show that the separated signals can be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dr-pato/ssgd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Phonetics and Phonology Research

MethodsEnd-to-End Neural Diarization