Profiling the Voice: Speaker-Specific Phoneme Fingerprinting for Speech Deepfake Detection

Jun Xue; Tong Zhang; Zhuolin Yi; Yihuan Huang; Yi Chai; Yiyang Zhang; Yanzhen Ren

arXiv:2605.17737·cs.SD·May 19, 2026

Profiling the Voice: Speaker-Specific Phoneme Fingerprinting for Speech Deepfake Detection

Jun Xue, Tong Zhang, Zhuolin Yi, Yihuan Huang, Yi Chai, Yiyang Zhang, Yanzhen Ren

PDF

1 Repo

TL;DR

This paper introduces Phoneme-based Voice Profiling (PVP), a speaker-specific, interpretable deepfake detection method that models unique phonetic traits to improve robustness against spoofing attacks.

Contribution

The paper presents a novel phoneme-level profiling framework using GMMs for personalized, data-efficient, and interpretable deepfake detection, along with a large-scale Chinese POI dataset.

Findings

01

PVP outperforms state-of-the-art detectors in POI spoofing scenarios.

02

Achieves significant EER reductions.

03

Provides phoneme-level interpretability for forensic analysis.

Abstract

The rapid advancement of generative AI has made audio deepfakes increasingly indistinguishable from authentic human vocals, posing significant threats to persons-of-interest (POI) such as public figures. Current detection systems primarily rely on generic, black-box models that fail to capture speaker-specific idiosyncratic traits and lack interpretability. In this paper, we propose Phoneme-based Voice Profiling (PVP), a novel personalized defense framework. By shifting the detection paradigm from macro-utterance analysis to micro-phonetic modeling, PVP captures the unique acoustic distributions underlying a POI's habitual articulatory patterns. Specifically, our framework models speaker-specific phonetic realizations using lightweight Gaussian Mixture Models (GMMs) estimated solely from bona fide reference speech. This design enables data-efficient profiling and robust generalization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

JunXue-tech/PVP
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.