Faked Speech Detection with Zero Prior Knowledge
Sahar Al Ajmi, Khizar Hayat, Alaa M. Al Obaidi, Naresh Kumar, Munaf, Najmuldeen, Baptiste Magnier

TL;DR
This paper presents a neural network-based classifier capable of detecting fake speech without prior knowledge, achieving over 94% accuracy, surpassing human performance on the same task.
Contribution
Introduces a blind speech forgery detection method using a deep neural network trained on extracted audio features, effective across multiple languages.
Findings
Achieved at least 94% accuracy in classifying real vs. fake speech.
Outperformed human observers, who achieved 85% accuracy.
Validated on datasets with English and mixed language audios.
Abstract
Audio is one of the most used ways of human communication, but at the same time it can be easily misused to trick people. With the revolution of AI, the related technologies are now accessible to almost everyone, thus making it simple for the criminals to commit crimes and forgeries. In this work, we introduce a neural network method to develop a classifier that will blindly classify an input audio as real or mimicked; the word 'blindly' refers to the ability to detect mimicked audio without references or real sources. We propose a deep neural network following a sequential model that comprises three hidden layers, with alternating dense and drop out layers. The proposed model was trained on a set of 26 important features extracted from a large dataset of audios to get a classifier that was tested on the same set of features from different audios. The data was extracted from two raw…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Speech and Audio Processing · Music and Audio Processing
