MultiAPI Spoof: A Multi-API Dataset and Local-Attention Network for Speech Anti-spoofing Detection
Xueping Zhang, Zhenshan Zhang, Yechen Wang, Linxi Li, Liwei Jin, Ming Li

TL;DR
This paper introduces a diverse multi-API speech spoofing dataset and a novel local-attention neural network, Nes2Net-LA, to improve detection accuracy and robustness against varied and unseen spoofing methods.
Contribution
The paper presents the MultiAPI Spoof dataset with 230 hours of synthetic speech from 30 APIs and proposes Nes2Net-LA, a local-attention network that enhances spoofing feature extraction.
Findings
Nes2Net-LA achieves state-of-the-art detection performance.
The dataset enables more realistic evaluation of anti-spoofing methods.
Nes2Net-LA demonstrates superior robustness to unseen spoofing conditions.
Abstract
Existing speech anti-spoofing benchmarks rely on a narrow set of public models, creating a substantial gap from real-world scenarios in which commercial systems employ diverse, often proprietary APIs. To address this issue, we introduce MultiAPI Spoof, a multi-API audio anti-spoofing dataset comprising about 230 hours of synthetic speech generated by 30 distinct APIs, including commercial services, open-source models, and online platforms. Furthermore, we propose Nes2Net-LA, a local-attention enhanced variant of Nes2Net that improves local context modeling and fine-grained spoofing feature extraction. Based on this dataset, we also define the API tracing task, enabling fine-grained attribution of spoofed audio to its generation source. Experiments show that Nes2Net-LA achieves state-of-the-art performance and offers superior robustness, particularly under diverse and unseen spoofing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Biometric Identification and Security
