PLDA with Two Sources of Inter-session Variability

Jes\'us Villalba

arXiv:1511.06772·stat.ML·November 24, 2015

PLDA with Two Sources of Inter-session Variability

Jes\'us Villalba

PDF

Open Access

TL;DR

This paper introduces a modified PLDA model that accounts for two sources of inter-session variability in multi-channel speaker recognition scenarios, improving the modeling of conversation and channel differences.

Contribution

It proposes a novel PLDA extension with two variability terms, one for conversation content and one for channel effects, derived with new equations.

Findings

01

Enhanced modeling of multi-channel recordings

02

Improved speaker recognition accuracy

03

Applicable to simultaneous multi-channel data

Abstract

In some speaker recognition scenarios we find conversations recorded simultaneously over multiple channels. That is the case of the interviews in the NIST SRE dataset. To take advantage of that, we propose a modification of the PLDA model that considers two different inter-session variability terms. The first term is tied between all the recordings belonging to the same conversation whereas the second is not. Thus, the former mainly intends to capture the variability due to the phonetic content of the conversation while the latter tries to capture the channel variability. In this document, we derive the equations for this model. This model was applied in the paper "Handling Recordings Acquired Simultaneously over Multiple Channels with PLDA" published at Interspeech 2013.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Multi-Agent Systems and Negotiation