Physics-Guided Deepfake Detection for Voice Authentication Systems

Alireza Mohammadi; Keshav Sood; Dhananjay Thiruvady; and Asef Nazari

arXiv:2512.06040·cs.SD·December 9, 2025

Physics-Guided Deepfake Detection for Voice Authentication Systems

Alireza Mohammadi, Keshav Sood, Dhananjay Thiruvady, and Asef Nazari

PDF

Open Access

TL;DR

This paper introduces a physics-guided deepfake detection framework for voice authentication that combines interpretable physics features with self-supervised learning and uncertainty estimation to enhance robustness against attacks.

Contribution

It presents a novel multi-modal ensemble architecture integrating physics-based features and uncertainty-aware learning for improved deepfake detection in voice systems.

Findings

01

Enhanced robustness to deepfake attacks

02

Effective detection of control-plane poisoning

03

Improved uncertainty estimation in voice authentication

Abstract

Voice authentication systems deployed at the network edge face dual threats: a) sophisticated deepfake synthesis attacks and b) control-plane poisoning in distributed federated learning protocols. We present a framework coupling physics-guided deepfake detection with uncertainty-aware in edge learning. The framework fuses interpretable physics features modeling vocal tract dynamics with representations coming from a self-supervised learning module. The representations are then processed via a Multi-Modal Ensemble Architecture, followed by a Bayesian ensemble providing uncertainty estimates. Incorporating physics-based characteristics evaluations and uncertainty estimates of audio samples allows our proposed framework to remain robust to both advanced deepfake attacks and sophisticated control-plane poisoning, addressing the complete threat model for networked voice authentication.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Adversarial Robustness in Machine Learning · Music and Audio Processing