Understanding the strengths and weaknesses of SSL models for audio deepfake model attribution

Gabriel P\^irlogeanu; Adriana Stan; Horia Cucu

arXiv:2603.13488·eess.AS·March 17, 2026

Understanding the strengths and weaknesses of SSL models for audio deepfake model attribution

Gabriel P\^irlogeanu, Adriana Stan, Horia Cucu

PDF

Open Access

TL;DR

This paper systematically investigates how self-supervised learning features capture model signatures in audio deepfakes, revealing their robustness, biases, and limitations in attribution tasks.

Contribution

It provides a detailed analysis of the factors influencing SSL-based audio deepfake attribution, highlighting its strengths and vulnerabilities.

Findings

01

SSL features effectively capture architectural signatures

02

Perturbations in model checkpoints and prompts affect attribution accuracy

03

SSL-based attribution has specific robustness and bias characteristics

Abstract

Audio deepfake model attribution aims to mitigate the misuse of synthetic speech by identifying the source model responsible for generating a given audio sample, enabling accountability and informing vendors. The task is challenging, but self-supervised learning (SSL)-derived acoustic features have demonstrated state-of-the-art attribution capabilities, yet the underlying factors driving their success and the limits of their discriminative power remain unclear. In this paper, we systematically investigate how SSL-derived features capture architectural signatures in audio deepfakes. By controlling multiple dimensions of the audio generation process we reveal how subtle perturbations in model checkpoints, text prompts, vocoders, or speaker identity influence attribution. Our results provide new insights into the robustness, biases, and limitations of SSL-based deepfake attribution,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Hate Speech and Cyberbullying Detection · Speech and Audio Processing