Leveraging Speaker Embeddings with Adversarial Multi-task Learning for   Age Group Classification

Kwangje Baeg; Yeong-Gwan Kim; Young-Sub Han; Byoung-Ki Jeon

arXiv:2301.09058·eess.AS·January 24, 2023

Leveraging Speaker Embeddings with Adversarial Multi-task Learning for Age Group Classification

Kwangje Baeg, Yeong-Gwan Kim, Young-Sub Han, Byoung-Ki Jeon

PDF

Open Access

TL;DR

This paper proposes an adversarial multi-task learning approach to improve age group classification from speaker embeddings, addressing the challenge of speech feature leakage and domain discrepancy.

Contribution

It introduces a novel adversarial multi-task learning framework that enhances age classification accuracy by aligning speaker embeddings across age groups.

Findings

01

Improved age group classification accuracy on VoxCeleb dataset.

02

Effective domain adaptation using adversarial training.

03

Comparison of different speaker embeddings for age classification.

Abstract

Recently, researchers have utilized neural network-based speaker embedding techniques in speaker-recognition tasks to identify speakers accurately. However, speaker-discriminative embeddings do not always represent speech features such as age group well. In an embedding model that has been highly trained to capture speaker traits, the task of age group classification is closer to speech information leakage. Hence, to improve age group classification performance, we consider the use of speaker-discriminative embeddings derived from adversarial multi-task learning to align features and reduce the domain discrepancy in age subgroups. In addition, we investigated different types of speaker embeddings to learn and generalize the domain-invariant representations for age groups. Experimental results on the VoxCeleb Enrichment dataset verify the effectiveness of our proposed adaptive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders

MethodsALIGN