When Spoof Detectors Travel: Evaluation Across 66 Languages in the Low-Resource Language Spoofing Corpus

Kirill Borodin; Vasiliy Kudryavtsev; Maxim Maslov; Mikhail Gorodnichev; Grach Mkrtchian

arXiv:2603.02364·cs.SD·April 23, 2026

When Spoof Detectors Travel: Evaluation Across 66 Languages in the Low-Resource Language Spoofing Corpus

Kirill Borodin, Vasiliy Kudryavtsev, Maxim Maslov, Mikhail Gorodnichev, Grach Mkrtchian

PDF

1 Repo 1 Datasets

TL;DR

This paper presents LRLspoof, a large multilingual synthetic speech dataset for evaluating cross-lingual spoof detection, revealing significant language-dependent disparities in model performance across 66 languages.

Contribution

Introduction of LRLspoof, a comprehensive multilingual synthetic speech corpus for cross-lingual spoof detection, and an analysis of model robustness across diverse languages.

Findings

01

Spoof rejection varies significantly across languages.

02

Language acts as an independent domain shift factor.

03

Benchmarking reveals model-dependent cross-lingual disparities.

Abstract

We introduce LRLspoof, a large-scale multilingual synthetic-speech corpus for cross-lingual spoof detection, comprising 2,732 hours of audio generated with 24 open-source TTS systems across 66 languages, including 45 low-resource languages under our operational definition. To evaluate robustness without requiring target-domain bonafide speech, we benchmark 11 publicly available countermeasures using threshold transfer: for each model we calibrate an EER operating point on pooled external benchmarks and apply the resulting threshold, reporting spoof rejection rate (SRR). Results show model-dependent cross-lingual disparity, with spoof rejection varying markedly across languages even under controlled conditions, highlighting language as an independent source of domain shift in spoof detection. The dataset is publicly available at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://huggingface.co/datasets/lab260/LRLspoof
github

Datasets

lab260/LRLspoof
dataset· 3.3k dl
3.3k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.