Deepfake Forensic Analysis: Source Dataset Attribution and Legal Implications of Synthetic Media Manipulation

Massimiliano Cassia; Luca Guarnera; Mirko Casu; Ignazio Zangara; Sebastiano Battiato

arXiv:2505.11110·cs.CV·May 19, 2025

Deepfake Forensic Analysis: Source Dataset Attribution and Legal Implications of Synthetic Media Manipulation

Massimiliano Cassia, Luca Guarnera, Mirko Casu, Ignazio Zangara, Sebastiano Battiato

PDF

Open Access

TL;DR

This paper presents a forensic framework that accurately attributes GAN-generated images to their training datasets using spectral, color, and local features, aiding legal and ethical verification.

Contribution

It introduces a novel interpretable feature analysis pipeline for dataset attribution of synthetic images, achieving high accuracy across multiple GAN architectures.

Findings

01

Supervised classifiers attain 98-99% accuracy in dataset attribution.

02

Frequency-domain features are most effective in capturing dataset-specific artifacts.

03

Dataset attribution can support legal actions like copyright enforcement and privacy protection.

Abstract

Synthetic media generated by Generative Adversarial Networks (GANs) pose significant challenges in verifying authenticity and tracing dataset origins, raising critical concerns in copyright enforcement, privacy protection, and legal compliance. This paper introduces a novel forensic framework for identifying the training dataset (e.g., CelebA or FFHQ) of GAN-generated images through interpretable feature analysis. By integrating spectral transforms (Fourier/DCT), color distribution metrics, and local feature descriptors (SIFT), our pipeline extracts discriminative statistical signatures embedded in synthetic outputs. Supervised classifiers (Random Forest, SVM, XGBoost) achieve 98-99% accuracy in binary classification (real vs. synthetic) and multi-class dataset attribution across diverse GAN architectures (StyleGAN, AttGAN, GDWCT, StarGAN, and StyleGAN2). Experimental results highlight…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis · Digital and Cyber Forensics

MethodsSupport Vector Machine