Disentangling Age and Identity with a Mutual Information Minimization   Approach for Cross-Age Speaker Verification

Fengrun Zhang; Wangjin Zhou; Yiming Liu; Wang Geng; Yahui; Shan; Chen Zhang

arXiv:2409.15974·cs.SD·September 25, 2024

Disentangling Age and Identity with a Mutual Information Minimization Approach for Cross-Age Speaker Verification

Fengrun Zhang, Wangjin Zhou, Yiming Liu, Wang Geng, Yahui, Shan, Chen Zhang

PDF

Open Access

TL;DR

This paper introduces a novel disentangled representation learning framework for cross-age speaker verification that minimizes mutual information to produce age-invariant speaker embeddings, improving performance across age gaps.

Contribution

The paper presents a mutual information minimization approach to disentangle age and identity features, with an aging-aware loss function for better cross-age speaker verification.

Findings

01

Outperforms existing methods on Vox-CA cross-age test sets

02

Produces age-invariant speaker embeddings

03

Effective in handling large age gaps

Abstract

There has been an increasing research interest in cross-age speaker verification~(CASV). However, existing speaker verification systems perform poorly in CASV due to the great individual differences in voice caused by aging. In this paper, we propose a disentangled representation learning framework for CASV based on mutual information~(MI) minimization. In our method, a backbone model is trained to disentangle the identity- and age-related embeddings from speaker information, and an MI estimator is trained to minimize the correlation between age- and identity-related embeddings via MI minimization, resulting in age-invariant speaker embeddings. Furthermore, by using the age gaps between positive and negative samples, we propose an aging-aware MI minimization loss function that allows the backbone model to focus more on the vocal changes with large age gaps. Experimental results show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing

MethodsFocus