Speaker Recognition in the Wild

Neeraj Chhimwal; Anirudh Gupta; Rishabh Gaur; Harveen Singh Chadha,; Priyanshi Shah; Ankur Dhuriya; Vivek Raghavan

arXiv:2205.02475·cs.SD·May 6, 2022

Speaker Recognition in the Wild

Neeraj Chhimwal, Anirudh Gupta, Rishabh Gaur, Harveen Singh Chadha,, Priyanshi Shah, Ankur Dhuriya, Vivek Raghavan

PDF

Open Access 1 Repo

TL;DR

This paper presents a pipeline for unsupervised speaker clustering in audio data, enabling identification of speaker count and labels without prior knowledge, aiding speech recognition data preparation.

Contribution

The authors introduce a novel unsupervised clustering pipeline with new metrics for evaluating speaker clusters in unlabeled audio data.

Findings

01

98% of data mapped to top 80% of clusters

02

Cluster Purity and Uniqueness metrics effectively evaluate clustering quality

03

Pipeline aids in preparing data for speech recognition models

Abstract

In this paper, we propose a pipeline to find the number of speakers, as well as audios belonging to each of these now identified speakers in a source of audio data where number of speakers or speaker labels are not known a priori. We used this approach as a part of our Data Preparation pipeline for Speech Recognition in Indic Languages (https://github.com/Open-Speech-EkStep/vakyansh-wav2vec2-experimentation). To understand and evaluate the accuracy of our proposed pipeline, we introduce two metrics: Cluster Purity, and Cluster Uniqueness. Cluster Purity quantifies how "pure" a cluster is. Cluster Uniqueness, on the other hand, quantifies what percentage of clusters belong only to a single dominant speaker. We discuss more on these metrics in section \ref{sec:metrics}. Since we develop this utility to aid us in identifying data based on speaker IDs before training an Automatic Speech…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Open-Speech-EkStep/vakyansh-wav2vec2-experimentation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing