Retrieval-Augmented Audio Deepfake Detection

Zuheng Kang; Yayun He; Botao Zhao; Xiaoyang Qu; Junqing Peng; Jing; Xiao; Jianzong Wang

arXiv:2404.13892·cs.SD·April 24, 2024

Retrieval-Augmented Audio Deepfake Detection

Zuheng Kang, Yayun He, Botao Zhao, Xiaoyang Qu, Junqing Peng, Jing, Xiao, Jianzong Wang

PDF

TL;DR

This paper introduces a retrieval-augmented detection framework for audio deepfake detection, enhancing performance by leveraging similar retrieved samples, and achieves state-of-the-art results on multiple datasets.

Contribution

It proposes a novel retrieval-augmented detection framework combined with a multi-fusion attentive classifier for improved audio deepfake detection.

Findings

01

Achieves state-of-the-art results on ASVspoof 2021 DF set.

02

Outperforms baseline methods on multiple datasets.

03

Retrieval improves detection by focusing on speaker-specific acoustic features.

Abstract

With recent advances in speech synthesis including text-to-speech (TTS) and voice conversion (VC) systems enabling the generation of ultra-realistic audio deepfakes, there is growing concern about their potential misuse. However, most deepfake (DF) detection methods rely solely on the fuzzy knowledge learned by a single model, resulting in performance bottlenecks and transparency issues. Inspired by retrieval-augmented generation (RAG), we propose a retrieval-augmented detection (RAD) framework that augments test samples with similar retrieved samples for enhanced detection. We also extend the multi-fusion attentive classifier to integrate it with our proposed RAD framework. Extensive experiments show the superior performance of the proposed RAD framework over baseline methods, achieving state-of-the-art results on the ASVspoof 2021 DF set and competitive results on the 2019 and 2021 LA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training