DiaCorrect: Error Correction Back-end For Speaker Diarization

Jiangyu Han; Federico Landini; Johan Rohdin; Mireia Diez; Lukas; Burget; Yuhang Cao; Heng Lu; Jan Cernocky

arXiv:2309.08377·eess.AS·September 18, 2023

DiaCorrect: Error Correction Back-end For Speaker Diarization

Jiangyu Han, Federico Landini, Johan Rohdin, Mireia Diez, Lukas, Burget, Yuhang Cao, Heng Lu, Jan Cernocky

PDF

Open Access 1 Repo

TL;DR

DiaCorrect is a novel error correction framework that refines speaker diarization outputs by leveraging a dual-encoder and transformer-based decoder, significantly reducing errors in telephony data.

Contribution

It introduces a new error correction approach for speaker diarization inspired by speech recognition techniques, utilizing a dual-encoder and transformer decoder architecture.

Findings

01

Effective improvement on 2-speaker telephony data

02

Reduces diarization errors significantly

03

Open-source implementation available

Abstract

In this work, we propose an error correction framework, named DiaCorrect, to refine the output of a diarization system in a simple yet effective way. This method is inspired by error correction techniques in automatic speech recognition. Our model consists of two parallel convolutional encoders and a transform-based decoder. By exploiting the interactions between the input recording and the initial system's outputs, DiaCorrect can automatically correct the initial speaker activities to minimize the diarization errors. Experiments on 2-speaker telephony data show that the proposed DiaCorrect can effectively improve the initial model's results. Our source code is publicly available at https://github.com/BUTSpeechFIT/diacorrect.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

butspeechfit/diacorrect
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing