Lightweight Model Attribution and Detection of Synthetic Speech via Audio Residual Fingerprints

Mat\'ias Pizarro; Mike Laszkiewicz; Dorothea Kolossa; Asja Fischer

arXiv:2411.14013·eess.AS·December 12, 2025

Lightweight Model Attribution and Detection of Synthetic Speech via Audio Residual Fingerprints

Mat\'ias Pizarro, Mike Laszkiewicz, Dorothea Kolossa, Asja Fischer

PDF

Open Access

TL;DR

This paper introduces a lightweight, training-free method for detecting and attributing synthetic speech to its source model using residual fingerprints, achieving high accuracy and robustness across various conditions.

Contribution

The paper presents a novel residual fingerprint approach for synthetic speech detection and attribution that is simple, model-agnostic, and effective without training.

Findings

01

AUROC scores above 99% across multiple systems

02

High robustness to audio distortions like echo and noise

03

Effective out-of-domain detection with F1 score of 0.91

Abstract

As speech generation technologies advance, so do risks of impersonation, misinformation, and spoofing. We present a lightweight, training-free approach for detecting synthetic speech and attributing it to its source model. Our method addresses three tasks: (1) single-model attribution in an open-world setting, (2) multi-model attribution in a closed-world setting, and (3) real vs. synthetic speech classification. The core idea is simple: we compute standardized average residuals--the difference between an audio signal and its filtered version--to extract model-agnostic fingerprints that capture synthesis artifacts. Experiments across multiple synthesis systems and languages show AUROC scores above 99%, with strong reliability even when only a subset of model outputs is available. The method maintains high performance under common audio distortions, including echo and moderate background…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Hate Speech and Cyberbullying Detection

MethodsSparse Evolutionary Training