Toroidal Probabilistic Spherical Discriminant Analysis
Anna Silnova, Niko Br\"ummer, Albert Swart, Luk\'a\v{s}, Burget

TL;DR
This paper introduces T-PSDA, a novel extension of PSDA that models speaker variabilities on toroidal submanifolds, achieving superior recognition accuracy over traditional methods on benchmark datasets.
Contribution
T-PSDA extends PSDA by incorporating toroidal submanifold modeling, enabling better representation of speaker variabilities with closed-form scoring and training.
Findings
T-PSDA matches cosine scoring accuracy on VoxCeleb.
T-PSDA outperforms PLDA on NIST SRE'21.
Model effectively captures within and between-speaker variabilities.
Abstract
In speaker recognition, where speech segments are mapped to embeddings on the unit hypersphere, two scoring back-ends are commonly used, namely cosine scoring and PLDA. We have recently proposed PSDA, an analog to PLDA that uses Von Mises-Fisher distributions instead of Gaussians. In this paper, we present toroidal PSDA (T-PSDA). It extends PSDA with the ability to model within and between-speaker variabilities in toroidal submanifolds of the hypersphere. Like PLDA and PSDA, the model allows closed-form scoring and closed-form EM updates for training. On VoxCeleb, we find T-PSDA accuracy on par with cosine scoring, while PLDA accuracy is inferior. On NIST SRE'21 we find that T-PSDA gives large accuracy gains compared to both cosine scoring and PLDA.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Neural Networks and Applications · Speech and Audio Processing
