An RFP dataset for Real, Fake, and Partially fake audio detection

Abdulazeez AlAli; George Theodorakopoulos

arXiv:2404.17721·cs.SD·April 30, 2024

An RFP dataset for Real, Fake, and Partially fake audio detection

Abdulazeez AlAli, George Theodorakopoulos

PDF

Open Access

TL;DR

This paper introduces the RFP dataset with various fake audio types, highlighting the challenge of detecting partial fake audio and evaluating detection models' performance on this more complex data.

Contribution

The paper presents a new dataset including partial fake audio, noise, voice conversion, and TTS, addressing limitations of existing datasets that only contain fully fake audio.

Findings

01

Detection models perform worse on partial fake audio with higher EER.

02

Lowest EER achieved was 25.42%, indicating room for improvement.

03

Including diverse fake audio types in datasets is crucial for robust detection.

Abstract

Recent advances in deep learning have enabled the creation of natural-sounding synthesised speech. However, attackers have also utilised these tech-nologies to conduct attacks such as phishing. Numerous public datasets have been created to facilitate the development of effective detection models. How-ever, available datasets contain only entirely fake audio; therefore, detection models may miss attacks that replace a short section of the real audio with fake audio. In recognition of this problem, the current paper presents the RFP da-taset, which comprises five distinct audio types: partial fake (PF), audio with noise, voice conversion (VC), text-to-speech (TTS), and real. The data are then used to evaluate several detection models, revealing that the available detec-tion models incur a markedly higher equal error rate (EER) when detecting PF audio instead of entirely fake audio. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Digital Media Forensic Detection · Speech and Audio Processing