Language-Invariant Multilingual Speaker Verification for the TidyVoice 2026 Challenge

Ze Li; Xiaoxiao Miao; Juan Liu; Ming Li

arXiv:2603.08092·eess.AS·March 10, 2026

Language-Invariant Multilingual Speaker Verification for the TidyVoice 2026 Challenge

Ze Li, Xiaoxiao Miao, Juan Liu, Ming Li

PDF

Open Access

TL;DR

This paper introduces a language-invariant multilingual speaker verification system using a self-supervised model, adversarial training, and speech synthesis to improve cross-lingual robustness and performance in the TidyVoice 2026 Challenge.

Contribution

It proposes a novel multilingual SV system with language-invariant embeddings, combining self-supervised learning, adversarial training, and synthetic speech augmentation.

Findings

01

Fine-tuning improves performance.

02

Adversarial training enhances robustness.

03

Synthetic speech data boosts accuracy with limited data.

Abstract

Multilingual speaker verification (SV) remains challenging due to limited cross-lingual data and language-dependent information in speaker embeddings. This paper presents a language-invariant multilingual SV system for the TidyVoice 2026 Challenge. We adopt the multilingual self-supervised w2v-BERT 2.0 model as the backbone, enhanced with Layer Adapters and Multi-scale Feature Aggregation to better exploit multi-layer representations. A language-adversarial training strategy with a Gradient Reversal Layer is applied to promote language-invariant speaker embeddings. Moreover, a multilingual zero-shot text-to-speech system is used to synthesize speech in multiple languages, improving language diversity. Experimental results demonstrate that fine-tuning the large-scale pretrained model yields competitive performance, while language-adversarial training further enhances robustness. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques