Cross-lingual Speaker Verification with Deep Feature Learning

Lantian Li; Dong Wang; Askar Rozi; Thomas Fang Zheng

arXiv:1706.07861·cs.SD·June 27, 2017·2 cites

Cross-lingual Speaker Verification with Deep Feature Learning

Lantian Li, Dong Wang, Askar Rozi, Thomas Fang Zheng

PDF

Open Access

TL;DR

This paper investigates a deep learning-based speaker verification system's robustness to language mismatch, demonstrating it outperforms traditional methods in cross-lingual scenarios with English training and Chinese or Uyghur testing.

Contribution

It introduces a deep feature learning approach for speaker verification that maintains high performance despite language mismatches, improving robustness over probabilistic models.

Findings

01

Deep learning features outperform i-vector systems in cross-lingual tests

02

The system maintains accuracy despite language mismatch

03

Deep feature extraction enhances robustness in multilingual scenarios

Abstract

Existing speaker verification (SV) systems often suffer from performance degradation if there is any language mismatch between model training, speaker enrollment, and test. A major cause of this degradation is that most existing SV methods rely on a probabilistic model to infer the speaker factor, so any significant change on the distribution of the speech signal will impact the inference. Recently, we proposed a deep learning model that can learn how to extract the speaker factor by a deep neural network (DNN). By this feature learning, an SV system can be constructed with a very simple back-end model. In this paper, we investigate the robustness of the feature-based SV system in situations with language mismatch. Our experiments were conducted on a complex cross-lingual scenario, where the model training was in English, and the enrollment and test were in Chinese or Uyghur. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing