An RFP dataset for Real, Fake, and Partially fake audio detection
Abdulazeez AlAli, George Theodorakopoulos

TL;DR
This paper introduces the RFP dataset with various fake audio types, highlighting the challenge of detecting partial fake audio and evaluating detection models' performance on this more complex data.
Contribution
The paper presents a new dataset including partial fake audio, noise, voice conversion, and TTS, addressing limitations of existing datasets that only contain fully fake audio.
Findings
Detection models perform worse on partial fake audio with higher EER.
Lowest EER achieved was 25.42%, indicating room for improvement.
Including diverse fake audio types in datasets is crucial for robust detection.
Abstract
Recent advances in deep learning have enabled the creation of natural-sounding synthesised speech. However, attackers have also utilised these tech-nologies to conduct attacks such as phishing. Numerous public datasets have been created to facilitate the development of effective detection models. How-ever, available datasets contain only entirely fake audio; therefore, detection models may miss attacks that replace a short section of the real audio with fake audio. In recognition of this problem, the current paper presents the RFP da-taset, which comprises five distinct audio types: partial fake (PF), audio with noise, voice conversion (VC), text-to-speech (TTS), and real. The data are then used to evaluate several detection models, revealing that the available detec-tion models incur a markedly higher equal error rate (EER) when detecting PF audio instead of entirely fake audio. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Digital Media Forensic Detection · Speech and Audio Processing
