End-to-End Speaker Diarization as Post-Processing

Shota Horiguchi; Paola Garcia; Yusuke Fujita; Shinji Watanabe; Kenji; Nagamatsu

arXiv:2012.10055·eess.AS·December 24, 2020·1 cites

End-to-End Speaker Diarization as Post-Processing

Shota Horiguchi, Paola Garcia, Yusuke Fujita, Shinji Watanabe, Kenji, Nagamatsu

PDF

Open Access

TL;DR

This paper proposes a hybrid speaker diarization approach combining clustering and end-to-end methods, significantly improving performance on multiple datasets by effectively handling overlapping speech.

Contribution

It introduces a novel iterative post-processing technique that enhances clustering-based diarization with a two-speaker end-to-end model, addressing limitations in overlapping speech detection.

Findings

01

Improved diarization accuracy across CALLHOME, AMI, and DIHARD II datasets.

02

Effective handling of overlapping speech through iterative refinement.

03

Consistent performance gains over state-of-the-art methods.

Abstract

This paper investigates the utilization of an end-to-end diarization model as post-processing of conventional clustering-based diarization. Clustering-based diarization methods partition frames into clusters of the number of speakers; thus, they typically cannot handle overlapping speech because each frame is assigned to one speaker. On the other hand, some end-to-end diarization methods can handle overlapping speech by treating the problem as multi-label classification. Although some methods can treat a flexible number of speakers, they do not perform well when the number of speakers is large. To compensate for each other's weakness, we propose to use a two-speaker end-to-end diarization method as post-processing of the results obtained by a clustering-based method. We iteratively select two speakers from the results and update the results of the two speakers to improve the overlapped…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing