Leveraging Speaker Embeddings with Adversarial Multi-task Learning for Age Group Classification
Kwangje Baeg, Yeong-Gwan Kim, Young-Sub Han, Byoung-Ki Jeon

TL;DR
This paper proposes an adversarial multi-task learning approach to improve age group classification from speaker embeddings, addressing the challenge of speech feature leakage and domain discrepancy.
Contribution
It introduces a novel adversarial multi-task learning framework that enhances age classification accuracy by aligning speaker embeddings across age groups.
Findings
Improved age group classification accuracy on VoxCeleb dataset.
Effective domain adaptation using adversarial training.
Comparison of different speaker embeddings for age classification.
Abstract
Recently, researchers have utilized neural network-based speaker embedding techniques in speaker-recognition tasks to identify speakers accurately. However, speaker-discriminative embeddings do not always represent speech features such as age group well. In an embedding model that has been highly trained to capture speaker traits, the task of age group classification is closer to speech information leakage. Hence, to improve age group classification performance, we consider the use of speaker-discriminative embeddings derived from adversarial multi-task learning to align features and reduce the domain discrepancy in age subgroups. In addition, we investigated different types of speaker embeddings to learn and generalize the domain-invariant representations for age groups. Experimental results on the VoxCeleb Enrichment dataset verify the effectiveness of our proposed adaptive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders
MethodsALIGN
