Uncertainty Quantification in Machine Learning for Joint Speaker Diarization and Identification
Simon W. McKnight, Aidan O. T. Hogg, Vincent W. Neo, Patrick A. Naylor

TL;DR
This paper explores uncertainty quantification in machine learning models for joint speaker diarization and identification, demonstrating how aleatoric and epistemic uncertainties can improve model reliability and performance, especially with model ensembles.
Contribution
It introduces a comprehensive analysis of uncertainty types in JSID models using CNNs and LSTMs, and proposes methods to leverage these uncertainties for enhanced speaker diarization accuracy.
Findings
Models on both $\
[0m
Model ensembles with Kalman filter smoothing outperform individual models in overlapping speaker scenarios.
Abstract
This paper studies modulation spectrum features () and mel-frequency cepstral coefficients () in joint speaker diarization and identification (JSID). JSID is important as speaker diarization on its own to distinguish speakers is insufficient for many applications, it is often necessary to identify speakers as well. Machine learning models are set up using convolutional neural networks (CNNs) on and recurrent neural networks long short-term memory (LSTMs) on , then concatenating into fully connected layers. Experiment 1 shows models on both and have better diarization error rates (DERs) than models on either alone; a CNN on has DER 29.09\%, compared to 27.78\% for a LSTM on and 19.44\% for a model on both. Experiment 1 also investigates aleatoric uncertainties and shows the model on both and has mean…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing
MethodsSparse Evolutionary Training · Tanh Activation · Sigmoid Activation · Long Short-Term Memory · Dropout · Monte Carlo Dropout
