U-vectors: Generating clusterable speaker embedding from unlabeled data

M. F. Mridha; Abu Quwsar Ohi; Muhammad Mostafa Monowar; Md. Abdul; Hamid; Md. Rashedul Islam; Yutaka Watanobe

arXiv:2102.03868·cs.SD·October 25, 2021

U-vectors: Generating clusterable speaker embedding from unlabeled data

M. F. Mridha, Abu Quwsar Ohi, Muhammad Mostafa Monowar, Md. Abdul, Hamid, Md. Rashedul Islam, Yutaka Watanobe

PDF

Open Access 1 Repo

TL;DR

This paper presents an unsupervised method for generating clusterable speaker embeddings from unlabeled speech data, improving robustness across diverse domains without relying on domain adaptation.

Contribution

Introduces u-vectors, an unsupervised approach to produce speaker embeddings from unlabeled data, reducing dependence on domain-specific training and adaptation.

Findings

01

Achieves satisfactory speaker recognition performance on multiple datasets.

02

Demonstrates robustness across different languages and domain shifts.

03

Uses pairwise architecture for effective unsupervised embedding generation.

Abstract

Speaker recognition deals with recognizing speakers by their speech. Most speaker recognition systems are built upon two stages, the first stage extracts low dimensional correlation embeddings from speech, and the second performs the classification task. The robustness of a speaker recognition system mainly depends on the extraction process of speech embeddings, which are primarily pre-trained on a large-scale dataset. As the embedding systems are pre-trained, the performance of speaker recognition models greatly depends on domain adaptation policy, which may reduce if trained using inadequate data. This paper introduces a speaker recognition strategy dealing with unlabeled data, which generates clusterable embedding vectors from small fixed-size speech frames. The unsupervised training strategy involves an assumption that a small speech segment should include a single speaker.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

QuwsarOhi/u-vectors
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing