DiaCorrect: Error Correction Back-end For Speaker Diarization
Jiangyu Han, Federico Landini, Johan Rohdin, Mireia Diez, Lukas, Burget, Yuhang Cao, Heng Lu, Jan Cernocky

TL;DR
DiaCorrect is a novel error correction framework that refines speaker diarization outputs by leveraging a dual-encoder and transformer-based decoder, significantly reducing errors in telephony data.
Contribution
It introduces a new error correction approach for speaker diarization inspired by speech recognition techniques, utilizing a dual-encoder and transformer decoder architecture.
Findings
Effective improvement on 2-speaker telephony data
Reduces diarization errors significantly
Open-source implementation available
Abstract
In this work, we propose an error correction framework, named DiaCorrect, to refine the output of a diarization system in a simple yet effective way. This method is inspired by error correction techniques in automatic speech recognition. Our model consists of two parallel convolutional encoders and a transform-based decoder. By exploiting the interactions between the input recording and the initial system's outputs, DiaCorrect can automatically correct the initial speaker activities to minimize the diarization errors. Experiments on 2-speaker telephony data show that the proposed DiaCorrect can effectively improve the initial model's results. Our source code is publicly available at https://github.com/BUTSpeechFIT/diacorrect.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
