A Practical Guide to Logical Access Voice Presentation Attack Detection
Xin Wang, Junichi Yamagishi

TL;DR
This paper provides a comprehensive, practical overview of voice presentation attack detection, focusing on logical access spoofing methods like speech synthesis and voice conversion, with experimental insights and open-source code.
Contribution
It offers a detailed guide on voice PAD techniques, experimental evaluation on benchmark datasets, and open-source code to facilitate research and development.
Findings
Recent PAD methods effectively detect speech synthesis artifacts
Benchmark datasets enable standardized evaluation of PAD models
Open-source code supports reproducibility and further research
Abstract
Voice-based human-machine interfaces with an automatic speaker verification (ASV) component are commonly used in the market. However, the threat from presentation attacks is also growing since attackers can use recent speech synthesis technology to produce a natural-sounding voice of a victim. Presentation attack detection (PAD) for ASV, or speech anti-spoofing, is therefore indispensable. Research on voice PAD has seen significant progress since the early 2010s, including the advancement in PAD models, benchmark datasets, and evaluation campaigns. This chapter presents a practical guide to the field of voice PAD, with a focus on logical access attacks using text-to-speech and voice conversion algorithms and spoofing countermeasures based on artifact detection. It introduces the basic concept of voice PAD, explains the common techniques, and provides an experimental study using recent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing
