PhonemeDF: A Synthetic Speech Dataset for Audio Deepfake Detection and Naturalness Evaluation

Vamshi Nallaguntla; Aishwarya Fursule; Shruti Kshirsagar; Anderson R. Avila

arXiv:2603.15037·cs.SD·March 17, 2026

PhonemeDF: A Synthetic Speech Dataset for Audio Deepfake Detection and Naturalness Evaluation

Vamshi Nallaguntla, Aishwarya Fursule, Shruti Kshirsagar, Anderson R. Avila

PDF

Open Access

TL;DR

This paper introduces PhonemeDF, a phoneme-level synthetic speech dataset for evaluating audio deepfake detection and naturalness, using phoneme distribution divergence to assess fidelity and improve detection methods.

Contribution

The work provides a new phoneme-aligned dataset with real and synthetic speech, and demonstrates how phoneme distribution divergence correlates with deepfake detection performance.

Findings

01

KLD between real and synthetic phoneme distributions correlates with detection accuracy.

02

PhonemeDF enables evaluation of naturalness at the phoneme level.

03

KLD can identify the most discriminative phonemes for deepfake detection.

Abstract

The growing sophistication of speech generated by Artificial Intelligence (AI) has introduced new challenges in audio deepfake detection. Text-to-speech (TTS) and voice conversion (VC) technologies can create highly convincing synthetic speech with naturalness and intelligibility. This poses serious threats to voice biometric security and to systems designed to combat the spread of spoken misinformation, where synthetic voices may be used to disseminate false or malicious content. While interest in AI-generated speech has increased, resources for evaluating naturalness at the phoneme level remain limited. In this work, we address this gap by presenting the Phoneme-Level DeepFake dataset (PhonemeDF), comprising parallel real and synthetic speech segmented at the phoneme level. Real speech samples are derived from a subset of LibriSpeech, while synthetic samples are generated using four…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Digital Media Forensic Detection · Voice and Speech Disorders