Lightweight Model Attribution and Detection of Synthetic Speech via Audio Residual Fingerprints
Mat\'ias Pizarro, Mike Laszkiewicz, Dorothea Kolossa, Asja Fischer

TL;DR
This paper introduces a lightweight, training-free method for detecting and attributing synthetic speech to its source model using residual fingerprints, achieving high accuracy and robustness across various conditions.
Contribution
The paper presents a novel residual fingerprint approach for synthetic speech detection and attribution that is simple, model-agnostic, and effective without training.
Findings
AUROC scores above 99% across multiple systems
High robustness to audio distortions like echo and noise
Effective out-of-domain detection with F1 score of 0.91
Abstract
As speech generation technologies advance, so do risks of impersonation, misinformation, and spoofing. We present a lightweight, training-free approach for detecting synthetic speech and attributing it to its source model. Our method addresses three tasks: (1) single-model attribution in an open-world setting, (2) multi-model attribution in a closed-world setting, and (3) real vs. synthetic speech classification. The core idea is simple: we compute standardized average residuals--the difference between an audio signal and its filtered version--to extract model-agnostic fingerprints that capture synthesis artifacts. Experiments across multiple synthesis systems and languages show AUROC scores above 99%, with strong reliability even when only a subset of model outputs is available. The method maintains high performance under common audio distortions, including echo and moderate background…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Hate Speech and Cyberbullying Detection
MethodsSparse Evolutionary Training
