Multilingual Source Tracing of Speech Deepfakes: A First Benchmark
Xi Xuan, Yang Xiao, Rohan Kumar Das, Tomi Kinnunen

TL;DR
This paper presents the first benchmark for tracing the source models of multilingual speech deepfakes, analyzing cross-lingual generalization and the impact of different modeling approaches.
Contribution
It introduces a comprehensive benchmark for multilingual speech deepfake source tracing, including dataset, protocol, and analysis of modeling techniques.
Findings
SSL representations improve cross-lingual generalization
Model identification is challenging across unseen languages
Fine-tuning impacts source tracing performance
Abstract
Recent progress in generative AI has made it increasingly easy to create natural-sounding deepfake speech from just a few seconds of audio. While these tools support helpful applications, they also raise serious concerns by making it possible to generate convincing fake speech in many languages. Current research has largely focused on detecting fake speech, but little attention has been given to tracing the source models used to generate it. This paper introduces the first benchmark for multilingual speech deepfake source tracing, covering both mono- and cross-lingual scenarios. We comparatively investigate DSP- and SSL-based modeling; examine how SSL representations fine-tuned on different languages impact cross-lingual generalization performance; and evaluate generalization to unseen languages and speakers. Our findings offer the first comprehensive insights into the challenges of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
