Cross-lingual Speaker Verification with Deep Feature Learning
Lantian Li, Dong Wang, Askar Rozi, Thomas Fang Zheng

TL;DR
This paper investigates a deep learning-based speaker verification system's robustness to language mismatch, demonstrating it outperforms traditional methods in cross-lingual scenarios with English training and Chinese or Uyghur testing.
Contribution
It introduces a deep feature learning approach for speaker verification that maintains high performance despite language mismatches, improving robustness over probabilistic models.
Findings
Deep learning features outperform i-vector systems in cross-lingual tests
The system maintains accuracy despite language mismatch
Deep feature extraction enhances robustness in multilingual scenarios
Abstract
Existing speaker verification (SV) systems often suffer from performance degradation if there is any language mismatch between model training, speaker enrollment, and test. A major cause of this degradation is that most existing SV methods rely on a probabilistic model to infer the speaker factor, so any significant change on the distribution of the speech signal will impact the inference. Recently, we proposed a deep learning model that can learn how to extract the speaker factor by a deep neural network (DNN). By this feature learning, an SV system can be constructed with a very simple back-end model. In this paper, we investigate the robustness of the feature-based SV system in situations with language mismatch. Our experiments were conducted on a complex cross-lingual scenario, where the model training was in English, and the enrollment and test were in Chinese or Uyghur. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing
