BioLip: Language-Generalizable Lip-Sync Deepfake Detection via Biomechanical Constraint Violation Modeling
Hao Chen, Junnan Xu

TL;DR
This paper introduces a novel deepfake detection method based on biomechanical constraints of lip motion, which remains effective across different languages and generators by analyzing lip movement dynamics.
Contribution
The authors propose a language-generalizable lip-sync deepfake detector that uses biomechanical constraints and landmark motion statistics, avoiding reliance on pixel or audio artifacts.
Findings
Detects deepfakes by analyzing lip motion dynamics.
Effective across different languages and generative models.
Uses only landmark coordinates, no pixel or audio data.
Abstract
Existing lip-sync deepfake detectors rely on pixel artifacts or audio-visual correspondence, and both fail under generator or language shift because the features they learn are tied to the training distribution. We take a different approach. Real lip motion is constrained by tissue mechanics and neuromuscular bandwidth; current generators impose none of these constraints, producing trajectories with elevated variance in velocity, acceleration, and jerk that real speech does not exhibit. We exploit this as a detection signal temporal lip jitter, by computing displacement, velocity, acceleration, and jerk statistics from 64 perioral landmarks over 25-frame windows and feeding them into a lightweight three-branch network. The model uses only landmark coordinates: no pixels, no audio, and no voiceprint data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
