A Practical Guide to Logical Access Voice Presentation Attack Detection

Xin Wang; Junichi Yamagishi

arXiv:2201.03321·eess.AS·January 11, 2022

A Practical Guide to Logical Access Voice Presentation Attack Detection

Xin Wang, Junichi Yamagishi

PDF

Open Access 1 Repo

TL;DR

This paper provides a comprehensive, practical overview of voice presentation attack detection, focusing on logical access spoofing methods like speech synthesis and voice conversion, with experimental insights and open-source code.

Contribution

It offers a detailed guide on voice PAD techniques, experimental evaluation on benchmark datasets, and open-source code to facilitate research and development.

Findings

01

Recent PAD methods effectively detect speech synthesis artifacts

02

Benchmark datasets enable standardized evaluation of PAD models

03

Open-source code supports reproducibility and further research

Abstract

Voice-based human-machine interfaces with an automatic speaker verification (ASV) component are commonly used in the market. However, the threat from presentation attacks is also growing since attackers can use recent speech synthesis technology to produce a natural-sounding voice of a victim. Presentation attack detection (PAD) for ASV, or speech anti-spoofing, is therefore indispensable. Research on voice PAD has seen significant progress since the early 2010s, including the advancement in PAD models, benchmark datasets, and evaluation campaigns. This chapter presents a practical guide to the field of voice PAD, with a focus on logical access attacks using text-to-speech and voice conversion algorithms and spoofing countermeasures based on artifact detection. It introduces the basic concept of voice PAD, explains the common techniques, and provides an experimental study using recent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nii-yamagishilab/project-NN-Pytorch-scripts
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing