Robust End-to-end Speaker Diarization with Generic Neural Clustering
Chenyu Yang, Yu Wang

TL;DR
This paper introduces a robust neural clustering method for end-to-end speaker diarization that dynamically estimates speaker count and outperforms traditional clustering methods, especially under mismatched conditions.
Contribution
The paper presents a novel neural clustering approach that integrates seamlessly with any chunk-level predictor, enabling fully supervised, dynamic speaker number estimation in diarization.
Findings
Outperforms constrained K-means clustering in DER under mismatched conditions
Enables dynamic estimation of the number of speakers during inference
Integrates with any chunk-level predictor for fully supervised diarization
Abstract
End-to-end speaker diarization approaches have shown exceptional performance over the traditional modular approaches. To further improve the performance of the end-to-end speaker diarization for real speech recordings, recently works have been proposed which integrate unsupervised clustering algorithms with the end-to-end neural diarization models. However, these methods have a number of drawbacks: 1) The unsupervised clustering algorithms cannot leverage the supervision from the available datasets; 2) The K-means-based unsupervised algorithms that are explored often suffer from the constraint violation problem; 3) There is unavoidable mismatch between the supervised training and the unsupervised inference. In this paper, a robust generic neural clustering approach is proposed that can be integrated with any chunk-level predictor to accomplish a fully supervised end-to-end speaker…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
