PLDA with Two Sources of Inter-session Variability
Jes\'us Villalba

TL;DR
This paper introduces a modified PLDA model that accounts for two sources of inter-session variability in multi-channel speaker recognition scenarios, improving the modeling of conversation and channel differences.
Contribution
It proposes a novel PLDA extension with two variability terms, one for conversation content and one for channel effects, derived with new equations.
Findings
Enhanced modeling of multi-channel recordings
Improved speaker recognition accuracy
Applicable to simultaneous multi-channel data
Abstract
In some speaker recognition scenarios we find conversations recorded simultaneously over multiple channels. That is the case of the interviews in the NIST SRE dataset. To take advantage of that, we propose a modification of the PLDA model that considers two different inter-session variability terms. The first term is tied between all the recordings belonging to the same conversation whereas the second is not. Thus, the former mainly intends to capture the variability due to the phonetic content of the conversation while the latter tries to capture the channel variability. In this document, we derive the equations for this model. This model was applied in the paper "Handling Recordings Acquired Simultaneously over Multiple Channels with PLDA" published at Interspeech 2013.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Multi-Agent Systems and Negotiation
