Physics-Guided Deepfake Detection for Voice Authentication Systems
Alireza Mohammadi, Keshav Sood, Dhananjay Thiruvady, and Asef Nazari

TL;DR
This paper introduces a physics-guided deepfake detection framework for voice authentication that combines interpretable physics features with self-supervised learning and uncertainty estimation to enhance robustness against attacks.
Contribution
It presents a novel multi-modal ensemble architecture integrating physics-based features and uncertainty-aware learning for improved deepfake detection in voice systems.
Findings
Enhanced robustness to deepfake attacks
Effective detection of control-plane poisoning
Improved uncertainty estimation in voice authentication
Abstract
Voice authentication systems deployed at the network edge face dual threats: a) sophisticated deepfake synthesis attacks and b) control-plane poisoning in distributed federated learning protocols. We present a framework coupling physics-guided deepfake detection with uncertainty-aware in edge learning. The framework fuses interpretable physics features modeling vocal tract dynamics with representations coming from a self-supervised learning module. The representations are then processed via a Multi-Modal Ensemble Architecture, followed by a Bayesian ensemble providing uncertainty estimates. Incorporating physics-based characteristics evaluations and uncertainty estimates of audio samples allows our proposed framework to remain robust to both advanced deepfake attacks and sophisticated control-plane poisoning, addressing the complete threat model for networked voice authentication.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Adversarial Robustness in Machine Learning · Music and Audio Processing
