LLM-based speaker diarization correction: A generalizable approach
Georgios Efstathiadis, Vijay Yadav, Anzar Abbas

TL;DR
This paper explores using fine-tuned large language models to improve speaker diarization accuracy post-ASR, proposing an ensemble approach for better generalizability across different ASR tools.
Contribution
It introduces a novel ensemble method combining models fine-tuned on transcripts from various ASR tools to enhance generalizability in diarization correction.
Findings
Fine-tuned LLMs significantly improve diarization accuracy within the same ASR domain.
Ensemble models outperform individual models across different ASR tools.
The approach offers a more generalizable, ASR-agnostic diarization correction method.
Abstract
Speaker diarization is necessary for interpreting conversations transcribed using automated speech recognition (ASR) tools. Despite significant developments in diarization methods, diarization accuracy remains an issue. Here, we investigate the use of large language models (LLMs) for diarization correction as a post-processing step. LLMs were fine-tuned using the Fisher corpus, a large dataset of transcribed conversations. The ability of the models to improve diarization accuracy in a holdout dataset from the Fisher corpus as well as an independent dataset was measured. We report that fine-tuned LLMs can markedly improve diarization accuracy. However, model performance is constrained to transcripts produced using the same ASR tool as the transcripts used for fine-tuning, limiting generalizability. To address this constraint, an ensemble model was developed by combining weights from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques
