EACELEB: An East Asian Language Speaking Celebrity Dataset for Speaker   Recognition

Desmond Caulley; Yufeng Yang; David Anderson

arXiv:2203.05333·cs.SD·March 11, 2022

EACELEB: An East Asian Language Speaking Celebrity Dataset for Speaker Recognition

Desmond Caulley, Yufeng Yang, David Anderson

PDF

Open Access 1 Repo

TL;DR

This paper introduces EACELEB, a new East Asian celebrity speaker dataset created using an efficient audio-visual data collection pipeline from YouTube, achieving competitive speaker recognition accuracy.

Contribution

It presents a novel fast data acquisition method using face-tracking for East Asian celebrities, and demonstrates its effectiveness for speaker recognition tasks.

Findings

01

Achieved approximately 4% equal error rate after diarization and fine-tuning.

02

Developed a scalable pipeline for collecting celebrity audio data from YouTube.

03

Showed comparable performance to Voxceleb on East Asian celebrity data.

Abstract

Large datasets are very useful for training speaker recognition systems, and various research groups have constructed several over the years. Voxceleb is a large dataset for speaker recognition that is extracted from Youtube videos. This paper presents an audio-visual method for acquiring audio data from Youtube given the speaker's name as input. The system follows a pipeline similar to that of the Voxceleb data acquisition method. However, our work focuses on fast data acquisition by using face-tracking in subsequent frames once a face has been detected -- this is preferable over face detection for every frame considering its computational cost. We show that applying audio diarization to our data after acquiring it can yield equal error rates comparable to Voxceleb. A secondary set of experiments showed that we could further decrease the error rate by fine-tuning a pre-trained x-vector…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dcaulley/av_diarization
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Face recognition and analysis