HoliAntiSpoof: Audio LLM for Holistic Speech Anti-Spoofing
Xuenan Xu, Yiming Ren, Liwei Liu, Wen Wu, Baoxiang Li, Chaochao Lu, Shuai Wang, Chao Zhang

TL;DR
HoliAntiSpoof introduces a novel audio large language model framework for holistic speech anti-spoofing, enabling joint reasoning over spoofing techniques, speech attributes, and semantic impacts, with improved detection and interpretability.
Contribution
The paper presents the first ALLM framework for speech anti-spoofing that reformulates detection as a text generation task, integrating semantic analysis and introducing a new benchmark.
Findings
HoliAntiSpoof outperforms traditional baselines in multiple settings.
In-context learning improves out-of-domain generalization.
ALLMs enable interpretable analysis of spoofing behaviors.
Abstract
Recent advances in speech synthesis and editing have made speech spoofing increasingly challenging. However, most existing methods treat spoofing as binary classification, overlooking that diverse spoofing techniques manipulate multiple, coupled speech attributes and their semantic effects. In this paper, we introduce HoliAntiSpoof, the first audio large language model (ALLM) framework for holistic speech anti-spoofing analysis. HoliAntiSpoof reformulates spoofing analysis as a unified text generation task, enabling joint reasoning over spoofing methods, affected speech attributes, and their semantic impacts. To support semantic-level analysis, we introduce DailyTalkEdit, a new anti-spoofing benchmark that simulates realistic conversational manipulations and provides annotations of semantic influence. Extensive experiments demonstrate that HoliAntiSpoof outperforms conventional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders
