Classical Machine Learning Baselines for Deepfake Audio Detection on the Fake-or-Real Dataset

Faheem Ahmad; Ajan Ahmed; Masudul Imtiaz

arXiv:2604.13400·eess.AS·April 16, 2026

Classical Machine Learning Baselines for Deepfake Audio Detection on the Fake-or-Real Dataset

Faheem Ahmad, Ajan Ahmed, Masudul Imtiaz

PDF

TL;DR

This paper develops and evaluates simple, interpretable classical machine learning models using acoustic features to detect deepfake audio, providing a transparent baseline for future research.

Contribution

It introduces a classical machine learning baseline with detailed feature analysis and statistical validation for deepfake audio detection on the Fake-or-Real dataset.

Findings

01

RBF SVM achieves ~93% accuracy and 7% EER.

02

Spectral features like spectral centroid and bandwidth are key discriminative cues.

03

Linear models reach around 75% accuracy.

Abstract

Deep learning has enabled highly realistic synthetic speech, raising concerns about fraud, impersonation, and disinformation. Despite rapid progress in neural detectors, transparent baselines are needed to reveal which acoustic cues reliably separate real from synthetic speech. This paper presents an interpretable classical machine learning baseline for deepfake audio detection using the Fake-or-Real (FoR) dataset. We extract prosodic, voice-quality, and spectral features from two-second clips at 44.1 kHz (high-fidelity) and 16 kHz (telephone-quality) sampling rates. Statistical analysis (ANOVA, correlation heatmaps) identifies features that differ significantly between real and fake speech. We then train multiple classifiers -- Logistic Regression, LDA, QDA, Gaussian Naive Bayes, SVMs, and GMMs -- and evaluate performance using accuracy, ROC-AUC, EER, and DET curves. Pairwise McNemar's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.