Study on Inter and Intra Speaker Variability in Speaker Recognition

Anton Okhotnikov; Nikita Torgashov; Ivan Yakovlev; Pavel Malov and; Rostislav Makarov

arXiv:2411.07754·eess.AS·November 13, 2024

Study on Inter and Intra Speaker Variability in Speaker Recognition

Anton Okhotnikov, Nikita Torgashov, Ivan Yakovlev, Pavel Malov and, Rostislav Makarov

PDF

Open Access

TL;DR

This paper analyzes how inter- and intra-speaker variability affect neural network-based speaker recognition systems and provides guidelines for data collection, including releasing metadata to improve research practices.

Contribution

It offers an analysis of speaker variability dependencies in training data and releases utterance upload date metadata for the VoxTube dataset to aid data collection practices.

Findings

01

Dependency between inter- and intra-speaker variability analyzed

02

Release of upload date metadata for VoxTube dataset

03

Guidelines for data collection and filtering

Abstract

Optimization of a trade-off between the number of speakers and their temporal variability (or session diversity) is crucial for the development of a speaker recognition system together with making the data collection process feasible from a time perspective. In this article, we provide the analysis of dependency between inter and intra speaker variability in training data for the modern neural network-based speaker recognition system using the VoxTube dataset for text-independent speaker recognition task. Besides, an auxiliary contribution of this work is a release of upload date metadata per utterance in a VoxTube dataset. We want this article to contribute to guidelines and best practices for collecting and filtering data from media hosting platforms to facilitate the efforts of researchers in developing speaker recognition systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Advanced Algorithms and Applications