TRACE: Training-Free Partial Audio Deepfake Detection via Embedding Trajectory Analysis of Speech Foundation Models
Awais Khan, Muhammad Umar Farooq, Kutub Uddin, Khalid Malik

TL;DR
TRACE is a training-free method that detects partial audio deepfakes by analyzing embedding trajectory dynamics in speech foundation models, avoiding supervised training and generalizing across models.
Contribution
It introduces a novel, training-free approach leveraging embedding dynamics for partial deepfake detection, eliminating the need for labeled data or model retraining.
Findings
TRACE achieves 8.08% EER on PartialSpoof, competitive with supervised methods.
It surpasses supervised baselines on LlamaPartialSpoof without target-domain data.
Embedding trajectory analysis effectively detects partial deepfakes across multiple benchmarks.
Abstract
Partial audio deepfakes, where synthesized segments are spliced into genuine recordings, are particularly deceptive because most of the audio remains authentic. Existing detectors are supervised: they require frame-level annotations, overfit to specific synthesis pipelines, and must be retrained as new generative models emerge. We argue that this supervision is unnecessary. We hypothesize that speech foundation models implicitly encode a forensic signal: genuine speech forms smooth, slowly varying embedding trajectories, while splice boundaries introduce abrupt disruptions in frame-level transitions. Building on this, we propose TRACE (Training-free Representation-based Audio Countermeasure via Embedding dynamics), a training-free framework that detects partial audio deepfakes by analyzing the first-order dynamics of frozen speech foundation model representations without any training,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
