LASE: Language-Adversarial Speaker Encoding for Indic Cross-Script Identity Preservation

Venkata Pushpak Teja Menta

arXiv:2605.00777·cs.SD·May 4, 2026

LASE: Language-Adversarial Speaker Encoding for Indic Cross-Script Identity Preservation

Venkata Pushpak Teja Menta

PDF

1 Repo 4 Models

TL;DR

LASE is a novel speaker encoder that maintains speaker identity across scripts in multilingual voice cloning by using adversarial training to remove language information from embeddings.

Contribution

The paper introduces LASE, a language-adversarial training method that improves cross-script speaker identity preservation in voice encoding.

Findings

01

LASE reduces the gap in speaker similarity across scripts to near zero.

02

LASE outperforms baselines in cross-script speaker recall with significantly less training data.

03

LASE amplifies the margin between cross-script and floor speaker similarity by 2.4-2.7 times.

Abstract

A speaker encoder used in multilingual voice cloning should treat the same speaker identically regardless of which script the audio was uttered in. Off-the-shelf encoders do not, and the failure is accent-conditional. On a 1043-pair Western-accented voice corpus across English, Hindi, Telugu, and Tamil, WavLM-base-plus-sv loses 0.082 absolute cosine similarity when the same voice changes script and ECAPA-TDNN loses 0.105. On a 1369-pair Indian-accented voice corpus, the gap shrinks to 0.006 (WavLM-SV) and 0.044 (ECAPA-TDNN). The leak is largest where it matters most for cross-script TTS: when a system projects a non-Indic-trained voice into Indic scripts. We present LASE (Language-Adversarial Speaker Encoder), a small projection head over frozen WavLM-base-plus trained with two losses: a supervised contrastive loss over voice identity, and a gradient-reversal cross-entropy against a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

praxelhq/lase
github

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.