VoxAging: Continuously Tracking Speaker Aging with a Large-Scale Longitudinal Dataset in English and Mandarin

Zhiqi Ai; Meixuan Bao; Zhiyong Chen; Zhi Yang; Xinnuo Li; Shugong Xu

arXiv:2505.21445·cs.SD·May 28, 2025

VoxAging: Continuously Tracking Speaker Aging with a Large-Scale Longitudinal Dataset in English and Mandarin

Zhiqi Ai, Meixuan Bao, Zhiyong Chen, Zhi Yang, Xinnuo Li, Shugong Xu

PDF

Open Access 1 Datasets

TL;DR

VoxAging is a large-scale, longitudinal dataset of English and Mandarin speakers collected over up to 17 years, enabling detailed study of speaker aging effects on verification systems.

Contribution

This paper introduces VoxAging, the first extensive longitudinal dataset for speaker aging research, covering multiple languages and detailed individual aging trajectories.

Findings

01

Aging impacts speaker verification performance.

02

Gender and age group influence aging patterns.

03

Longitudinal data reveals individual speaker aging processes.

Abstract

The performance of speaker verification systems is adversely affected by speaker aging. However, due to challenges in data collection, particularly the lack of sustained and large-scale longitudinal data for individuals, research on speaker aging remains difficult. In this paper, we present VoxAging, a large-scale longitudinal dataset collected from 293 speakers (226 English speakers and 67 Mandarin speakers) over several years, with the longest time span reaching 17 years (approximately 900 weeks). For each speaker, the data were recorded at weekly intervals. We studied the phenomenon of speaker aging and its effects on advanced speaker verification systems, analyzed individual speaker aging processes, and explored the impact of factors such as age group and gender on speaker aging research.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

ZhiqiAi/voxaging
dataset· 16 dl
16 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis