Toroidal Probabilistic Spherical Discriminant Analysis

Anna Silnova; Niko Br\"ummer; Albert Swart; Luk\'a\v{s}; Burget

arXiv:2210.15441·cs.SD·October 28, 2022

Toroidal Probabilistic Spherical Discriminant Analysis

Anna Silnova, Niko Br\"ummer, Albert Swart, Luk\'a\v{s}, Burget

PDF

Open Access 2 Repos

TL;DR

This paper introduces T-PSDA, a novel extension of PSDA that models speaker variabilities on toroidal submanifolds, achieving superior recognition accuracy over traditional methods on benchmark datasets.

Contribution

T-PSDA extends PSDA by incorporating toroidal submanifold modeling, enabling better representation of speaker variabilities with closed-form scoring and training.

Findings

01

T-PSDA matches cosine scoring accuracy on VoxCeleb.

02

T-PSDA outperforms PLDA on NIST SRE'21.

03

Model effectively captures within and between-speaker variabilities.

Abstract

In speaker recognition, where speech segments are mapped to embeddings on the unit hypersphere, two scoring back-ends are commonly used, namely cosine scoring and PLDA. We have recently proposed PSDA, an analog to PLDA that uses Von Mises-Fisher distributions instead of Gaussians. In this paper, we present toroidal PSDA (T-PSDA). It extends PSDA with the ability to model within and between-speaker variabilities in toroidal submanifolds of the hypersphere. Like PLDA and PSDA, the model allows closed-form scoring and closed-form EM updates for training. On VoxCeleb, we find T-PSDA accuracy on par with cosine scoring, while PLDA accuracy is inferior. On NIST SRE'21 we find that T-PSDA gives large accuracy gains compared to both cosine scoring and PLDA.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Neural Networks and Applications · Speech and Audio Processing