TL;DR
This paper introduces a Bayesian HMM clustering method for speaker diarization using x-vectors, demonstrating its superior performance on standard datasets and providing new derivations, implementation details, and evaluation protocols.
Contribution
It presents the first derivation and efficient implementation of the VBx model, along with state-of-the-art diarization results and a new evaluation protocol for AMI data.
Findings
VBx outperforms other diarization methods on CALLHOME, AMI, and DIHARDII datasets.
The paper provides the derivation and update formulas for the VBx model.
New standardized evaluation protocol for AMI dataset proposed.
Abstract
The recently proposed VBx diarization method uses a Bayesian hidden Markov model to find speaker clusters in a sequence of x-vectors. In this work we perform an extensive comparison of performance of the VBx diarization with other approaches in the literature and we show that VBx achieves superior performance on three of the most popular datasets for evaluating diarization: CALLHOME, AMI and DIHARDII datasets. Further, we present for the first time the derivation and update formulae for the VBx model, focusing on the efficiency and simplicity of this model as compared to the previous and more complex BHMM model working on frame-by-frame standard Cepstral features. Together with this publication, we release the recipe for training the x-vector extractors used in our experiments on both wide and narrowband data, and the VBx recipes that attain state-of-the-art performance on all three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
