Multi-class Spectral Clustering with Overlaps for Speaker Diarization
Desh Raj, Zili Huang, Sanjeev Khudanpur

TL;DR
This paper introduces an overlap-aware spectral clustering method for speaker diarization that leverages an overlap detector and convex optimization, significantly improving diarization accuracy in overlapping speech scenarios.
Contribution
It presents a novel spectral clustering approach integrated with an overlap detector and convex optimization, advancing overlap-aware speaker diarization techniques.
Findings
Achieves 24.0% DER on AMI corpus, a 15.2% improvement over baseline.
Effective in high-overlap conditions as shown on LibriCSS data.
Outperforms other overlap-aware diarization methods.
Abstract
This paper describes a method for overlap-aware speaker diarization. Given an overlap detector and a speaker embedding extractor, our method performs spectral clustering of segments informed by the output of the overlap detector. This is achieved by transforming the discrete clustering problem into a convex optimization problem which is solved by eigen-decomposition. Thereafter, we discretize the solution by alternatively using singular value decomposition and a modified version of non-maximal suppression which is constrained by the output of the overlap detector. Furthermore, we detail an HMM-DNN based overlap detector which performs frame-level classification and enforces duration constraints through HMM state transitions. Our method achieves a test diarization error rate (DER) of 24.0% on the mixed-headset setting of the AMI meeting corpus, which is a relative improvement of 15.2%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
