TL;DR
This paper introduces a hierarchical discriminative PLDA-based model for spoken language recognition, significantly improving detection accuracy of related languages through discriminative training and a novel two-level scoring approach.
Contribution
It proposes a new hierarchical PLDA model trained discriminatively for better language detection, especially among closely related languages, outperforming traditional methods.
Findings
Hierarchical PLDA model improves detection of related languages.
Discriminative training yields large performance gains.
Model is robust across diverse datasets and conditions.
Abstract
Spoken language recognition (SLR) refers to the automatic process used to determine the language present in a speech sample. SLR is an important task in its own right, for example, as a tool to analyze or categorize large amounts of multi-lingual data. Further, it is also an essential tool for selecting downstream applications in a work flow, for example, to chose appropriate speech recognition or machine translation models. SLR systems are usually composed of two stages, one where an embedding representing the audio sample is extracted and a second one which computes the final scores for each language. In this work, we approach the SLR task as a detection problem and implement the second stage as a probabilistic linear discriminant analysis (PLDA) model. We show that discriminative training of the PLDA parameters gives large gains with respect to the usual generative training. Further,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSurrogate Lagrangian Relaxation
