HQ-MPSD: A Multilingual Artifact-Controlled Benchmark for Partial Deepfake Speech Detection

Menglu Li; Majd Alber; Ramtin Asgarianamiri; Lian Zhao; Xiao-Ping Zhang

arXiv:2512.13012·cs.SD·December 16, 2025

HQ-MPSD: A Multilingual Artifact-Controlled Benchmark for Partial Deepfake Speech Detection

Menglu Li, Majd Alber, Ramtin Asgarianamiri, Lian Zhao, Xiao-Ping Zhang

PDF

Open Access

TL;DR

This paper introduces HQ-MPSD, a high-quality, multilingual dataset for partial deepfake speech detection, highlighting the challenges in generalization faced by current models on realistic, artifact-free manipulations.

Contribution

The creation of HQ-MPSD, a large-scale, linguistically coherent, and naturalistic partial deepfake speech dataset that addresses limitations of previous datasets and provides a challenging benchmark.

Findings

01

State-of-the-art models perform poorly on HQ-MPSD with over 80% performance drop.

02

The dataset reveals significant generalization challenges in current detection methods.

03

HQ-MPSD's diversity and realism make it a more effective benchmark for future research.

Abstract

Detecting partial deepfake speech is challenging because manipulations occur only in short regions while the surrounding audio remains authentic. However, existing detection methods are fundamentally limited by the quality of available datasets, many of which rely on outdated synthesis systems and generation procedures that introduce dataset-specific artifacts rather than realistic manipulation cues. To address this gap, we introduce HQ-MPSD, a high-quality multilingual partial deepfake speech dataset. HQ-MPSD is constructed using linguistically coherent splice points derived from fine-grained forced alignment, preserving prosodic and semantic continuity and minimizing audible and visual boundary artifacts. The dataset contains 350.8 hours of speech across eight languages and 550 speakers, with background effects added to better reflect real-world acoustic conditions. MOS evaluations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Face recognition and analysis