SOT Triggered Neural Clustering for Speaker Attributed ASR
Xianrui Zheng, Guangzhi Sun, Chao Zhang, Philip C. Woodland

TL;DR
This paper presents a neural clustering approach for speaker-attributed ASR that enables simultaneous diarisation and transcription, reducing error propagation and improving accuracy on meeting datasets.
Contribution
It introduces a segment-level discriminative neural clustering method that integrates speaker diarisation directly into the ASR system without needing separate clustering algorithms.
Findings
SDNC reduces diarisation error rate by 19% on AMI dataset.
Parallel SDNC system improves cpWER by 7%/4% on Dev/Eval sets.
System operates entirely on neural networks, eliminating traditional clustering steps.
Abstract
This paper introduces a novel approach to speaker-attributed ASR transcription using a neural clustering method. With a parallel processing mechanism, diarisation and ASR can be applied simultaneously, helping to prevent the accumulation of errors from one sub-system to the next in a cascaded system. This is achieved by the use of ASR, trained using a serialised output training method, together with segment-level discriminative neural clustering (SDNC) to assign speaker labels. With SDNC, our system does not require an extra non-neural clustering method to assign speaker labels, thus allowing the entire system to be based on neural networks. Experimental results on the AMI meeting dataset demonstrate that SDNC outperforms spectral clustering (SC) by a 19% relative diarisation error rate (DER) reduction on the AMI Eval set. When compared with the cascaded system with SC, the parallel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing
MethodsSpectral Clustering
