TL;DR
This paper introduces a semisupervised neural network approach using Ladder Networks for language identification, effectively leveraging both labeled and unlabeled speech data, and handling out-of-set languages to improve accuracy.
Contribution
It presents a novel neural network architecture with Ladder Network training for language ID that incorporates unlabeled data and manages out-of-set languages.
Findings
Improved language identification accuracy on NIST 2015 dataset.
Effective use of unlabeled data in training process.
Handling of out-of-set languages demonstrated.
Abstract
In this study we address the problem of training a neuralnetwork for language identification using both labeled and unlabeled speech samples in the form of i-vectors. We propose a neural network architecture that can also handle out-of-set languages. We utilize a modified version of the recently proposed Ladder Network semisupervised training procedure that optimizes the reconstruction costs of a stack of denoising autoencoders. We show that this approach can be successfully applied to the case where the training dataset is composed of both labeled and unlabeled acoustic data. The results show enhanced language identification on the NIST 2015 language identification dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
