TL;DR
This paper introduces Phoneme-based Voice Profiling (PVP), a speaker-specific, interpretable deepfake detection method that models unique phonetic traits to improve robustness against spoofing attacks.
Contribution
The paper presents a novel phoneme-level profiling framework using GMMs for personalized, data-efficient, and interpretable deepfake detection, along with a large-scale Chinese POI dataset.
Findings
PVP outperforms state-of-the-art detectors in POI spoofing scenarios.
Achieves significant EER reductions.
Provides phoneme-level interpretability for forensic analysis.
Abstract
The rapid advancement of generative AI has made audio deepfakes increasingly indistinguishable from authentic human vocals, posing significant threats to persons-of-interest (POI) such as public figures. Current detection systems primarily rely on generic, black-box models that fail to capture speaker-specific idiosyncratic traits and lack interpretability. In this paper, we propose Phoneme-based Voice Profiling (PVP), a novel personalized defense framework. By shifting the detection paradigm from macro-utterance analysis to micro-phonetic modeling, PVP captures the unique acoustic distributions underlying a POI's habitual articulatory patterns. Specifically, our framework models speaker-specific phonetic realizations using lightweight Gaussian Mixture Models (GMMs) estimated solely from bona fide reference speech. This design enables data-efficient profiling and robust generalization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
