Neural Codec-based Adversarial Sample Detection for Speaker Verification

Xuanjun Chen; Jiawei Du; Haibin Wu; Jyh-Shing Roger Jang; Hung-yi Lee

arXiv:2406.04582·eess.AS·June 10, 2024

Neural Codec-based Adversarial Sample Detection for Speaker Verification

Xuanjun Chen, Jiawei Du, Haibin Wu, Jyh-Shing Roger Jang, Hung-yi Lee

PDF

Open Access

TL;DR

This paper introduces a neural codec-based method for detecting adversarial samples in speaker verification systems, significantly improving detection accuracy over existing techniques by comparing original and re-synthesized audio.

Contribution

The study proposes a novel neural codec-based detection approach that outperforms all existing methods, including ensemble models, for adversarial sample detection in speaker verification.

Findings

01

Descript-audio-codec achieves highest detection rate among tested codecs.

02

Single-model approach surpasses state-of-the-art ensemble methods.

03

Method effectively discerns genuine from adversarial audio samples.

Abstract

Automatic Speaker Verification (ASV), increasingly used in security-critical applications, faces vulnerabilities from rising adversarial attacks, with few effective defenses available. In this paper, we propose a neural codec-based adversarial sample detection method for ASV. The approach leverages the codec's ability to discard redundant perturbations and retain essential information. Specifically, we distinguish between genuine and adversarial samples by comparing ASV score differences between original and re-synthesized audio (by codec models). This comprehensive study explores all open-source neural codecs and their variant models for experiments. The Descript-audio-codec model stands out by delivering the highest detection rate among 15 neural codecs and surpassing seven prior state-of-the-art (SOTA) detection methods. Note that, our single-model method even outperforms a SOTA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Nuclear Engineering Thermal-Hydraulics