Cross-Age Speaker Verification: Learning Age-Invariant Speaker   Embeddings

Xiaoyi Qin; Na Li; Chao Weng; Dan Su; Ming Li

arXiv:2207.05929·eess.AS·July 14, 2022

Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings

Xiaoyi Qin, Na Li, Chao Weng, Dan Su, Ming Li

PDF

Open Access 1 Repo

TL;DR

This paper addresses the challenge of cross-age speaker verification by creating age-labeled datasets using face-based age estimation, and proposes an age-invariant embedding learning method that significantly improves verification accuracy across age gaps.

Contribution

The paper introduces a novel age-invariant speaker representation learning method and constructs new cross-age test sets for speaker verification research.

Findings

01

Baseline performance drops significantly with age gaps.

02

Proposed method reduces EER by over 10% on challenging cross-age test set.

03

Constructed multiple cross-age test sets based on VoxCeleb dataset.

Abstract

Automatic speaker verification has achieved remarkable progress in recent years. However, there is little research on cross-age speaker verification (CASV) due to insufficient relevant data. In this paper, we mine cross-age test sets based on the VoxCeleb dataset and propose our age-invariant speaker representation(AISR) learning method. Since the VoxCeleb is collected from the YouTube platform, the dataset consists of cross-age data inherently. However, the meta-data does not contain the speaker age label. Therefore, we adopt the face age estimation method to predict the speaker age value from the associated visual data, then label the audio recording with the estimated age. We construct multiple Cross-Age test sets on VoxCeleb (Vox-CA), which deliberately select the positive trials with large age-gap. Also, the effect of nationality and gender is considered in selecting negative pairs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

qinxiaoyi/cross-age_speaker_verification
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Face recognition and analysis

MethodsTest · ALIGN