Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings
Xiaoyi Qin, Na Li, Chao Weng, Dan Su, Ming Li

TL;DR
This paper addresses the challenge of cross-age speaker verification by creating age-labeled datasets using face-based age estimation, and proposes an age-invariant embedding learning method that significantly improves verification accuracy across age gaps.
Contribution
The paper introduces a novel age-invariant speaker representation learning method and constructs new cross-age test sets for speaker verification research.
Findings
Baseline performance drops significantly with age gaps.
Proposed method reduces EER by over 10% on challenging cross-age test set.
Constructed multiple cross-age test sets based on VoxCeleb dataset.
Abstract
Automatic speaker verification has achieved remarkable progress in recent years. However, there is little research on cross-age speaker verification (CASV) due to insufficient relevant data. In this paper, we mine cross-age test sets based on the VoxCeleb dataset and propose our age-invariant speaker representation(AISR) learning method. Since the VoxCeleb is collected from the YouTube platform, the dataset consists of cross-age data inherently. However, the meta-data does not contain the speaker age label. Therefore, we adopt the face age estimation method to predict the speaker age value from the associated visual data, then label the audio recording with the estimated age. We construct multiple Cross-Age test sets on VoxCeleb (Vox-CA), which deliberately select the positive trials with large age-gap. Also, the effect of nationality and gender is considered in selecting negative pairs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Face recognition and analysis
MethodsTest · ALIGN
