LJ-Spoof: A Generatively Varied Corpus for Audio Anti-Spoofing and Synthesis Source Tracing

Surya Subramani; Hashim Ali; Hafiz Malik

arXiv:2601.07958·cs.SD·January 14, 2026

LJ-Spoof: A Generatively Varied Corpus for Audio Anti-Spoofing and Synthesis Source Tracing

Surya Subramani, Hashim Ali, Hafiz Malik

PDF

Open Access

TL;DR

LJ-Spoof is a comprehensive, variably generated audio corpus designed to improve speaker-specific anti-spoofing and source tracing by providing diverse, systematic variations in synthesis parameters and training data.

Contribution

The paper introduces LJ-Spoof, a large, systematically varied audio dataset that addresses the lack of diverse datasets for advancing anti-spoofing and source tracing methods.

Findings

01

Enables robust speaker-conditioned anti-spoofing

02

Facilitates fine-grained synthesis-source tracing

03

Serves as a benchmark evaluation suite

Abstract

Speaker-specific anti-spoofing and synthesis-source tracing are central challenges in audio anti-spoofing. Progress has been hampered by the lack of datasets that systematically vary model architectures, synthesis pipelines, and generative parameters. To address this gap, we introduce LJ-Spoof, a speaker-specific, generatively diverse corpus that systematically varies prosody, vocoders, generative hyperparameters, bona fide prompt sources, training regimes, and neural post-processing. The corpus spans one speakers-including studio-quality recordings-30 TTS families, 500 generatively variant subsets, 10 bona fide neural-processing variants, and more than 3 million utterances. This variation-dense design enables robust speaker-conditioned anti-spoofing and fine-grained synthesis-source tracing. We further position this dataset as both a practical reference training resource and a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research