Towards Speaker Age Estimation with Label Distribution Learning
Shijing Si, Jianzong Wang, Junqing Peng, Jing Xiao

TL;DR
This paper introduces a label distribution learning approach for speaker age estimation that captures label ambiguity, combining classification and regression to improve accuracy and robustness, demonstrated by significant MAE reduction on real-world data.
Contribution
The paper proposes a novel LDL-based method for speaker age estimation that models age labels as distributions, effectively handling label ambiguity and improving performance.
Findings
Outperforms baseline methods with a 10% MAE reduction on real-world data.
Combines classification and regression for more robust age estimation.
Demonstrates effectiveness on public and real-world datasets.
Abstract
Existing methods for speaker age estimation usually treat it as a multi-class classification or a regression problem. However, precise age identification remains a challenge due to label ambiguity, \emph{i.e.}, utterances from adjacent age of the same person are often indistinguishable. To address this, we utilize the ambiguous information among the age labels, convert each age label into a discrete label distribution and leverage the label distribution learning (LDL) method to fit the data. For each audio data sample, our method produces a age distribution of its speaker, and on top of the distribution we also perform two other tasks: age prediction and age uncertainty minimization. Therefore, our method naturally combines the age classification and regression approaches, which enhances the robustness of our method. We conduct experiments on the public NIST SRE08-10 dataset and a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
