Robust Speaker Clustering using Mixtures of von Mises-Fisher   Distributions for Naturalistic Audio Streams

Harishchandra Dubey; Abhijeet Sangwan; John H. L. Hansen

arXiv:1808.06045·cs.SD·August 21, 2018

Robust Speaker Clustering using Mixtures of von Mises-Fisher Distributions for Naturalistic Audio Streams

Harishchandra Dubey, Abhijeet Sangwan, John H. L. Hansen

PDF

TL;DR

This paper introduces a robust speaker clustering method using mixtures of von Mises-Fisher distributions, significantly improving diarization accuracy in naturalistic multi-speaker audio streams.

Contribution

The study proposes a novel speaker clustering approach based on von Mises-Fisher mixture models, tailored for high-dimensional normalized i-Vectors in naturalistic settings.

Findings

01

Achieved up to 44.48% relative improvement on PLTL corpus.

02

Achieved up to 53.68% relative improvement on AMI corpus.

03

Outperformed baseline K-means clustering with cosine distance.

Abstract

Speaker Diarization (i.e. determining who spoke and when?) for multi-speaker naturalistic interactions such as Peer-Led Team Learning (PLTL) sessions is a challenging task. In this study, we propose robust speaker clustering based on mixture of multivariate von Mises-Fisher distributions. Our diarization pipeline has two stages: (i) ground-truth segmentation; (ii) proposed speaker clustering. The ground-truth speech activity information is used for extracting i-Vectors from each speechsegment. We post-process the i-Vectors with principal component analysis for dimension reduction followed by lengthnormalization. Normalized i-Vectors are high-dimensional unit vectors possessing discriminative directional characteristics. We model the normalized i-Vectors with a mixture model consisting of multivariate von Mises-Fisher distributions. K-means clustering with cosine distance is chosen as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.