Causally Disentangled Contrastive Learning for Multilingual Speaker Embeddings

Mari\"ette Olijslager; Seyed Sahand Mohammadi Ziabari; Ali Mohammed Mansoor Alsahag

arXiv:2602.01363·cs.SD·February 3, 2026

Causally Disentangled Contrastive Learning for Multilingual Speaker Embeddings

Mari\"ette Olijslager, Seyed Sahand Mohammadi Ziabari, Ali Mohammed Mansoor Alsahag

PDF

Open Access

TL;DR

This paper explores how demographic attributes like gender, age, and accent are encoded in self-supervised speaker embeddings and evaluates methods to reduce this leakage without significantly harming speaker verification performance.

Contribution

It introduces and compares two debiasing strategies—adversarial training and a causal bottleneck—for mitigating demographic information in speaker embeddings.

Findings

01

Gender information is strongly linearly encoded in embeddings.

02

Adversarial debiasing reduces gender leakage but affects verification accuracy.

03

Causal bottleneck suppresses demographic info but causes performance degradation.

Abstract

Self-supervised speaker embeddings are widely used in speaker verification systems, but prior work has shown that they often encode sensitive demographic attributes, raising fairness and privacy concerns. This paper investigates the extent to which demographic information, specifically gender, age, and accent, is present in SimCLR-trained speaker embeddings and whether such leakage can be mitigated without severely degrading speaker verification performance. We study two debiasing strategies: adversarial training through gradient reversal and a causal bottleneck architecture that explicitly separates demographic and residual information. Demographic leakage is quantified using both linear and nonlinear probing classifiers, while speaker verification performance is evaluated using ROC-AUC and EER. Our results show that gender information is strongly and linearly encoded in baseline…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Adversarial Robustness in Machine Learning