Deepfake Forensic Analysis: Source Dataset Attribution and Legal Implications of Synthetic Media Manipulation
Massimiliano Cassia, Luca Guarnera, Mirko Casu, Ignazio Zangara, Sebastiano Battiato

TL;DR
This paper presents a forensic framework that accurately attributes GAN-generated images to their training datasets using spectral, color, and local features, aiding legal and ethical verification.
Contribution
It introduces a novel interpretable feature analysis pipeline for dataset attribution of synthetic images, achieving high accuracy across multiple GAN architectures.
Findings
Supervised classifiers attain 98-99% accuracy in dataset attribution.
Frequency-domain features are most effective in capturing dataset-specific artifacts.
Dataset attribution can support legal actions like copyright enforcement and privacy protection.
Abstract
Synthetic media generated by Generative Adversarial Networks (GANs) pose significant challenges in verifying authenticity and tracing dataset origins, raising critical concerns in copyright enforcement, privacy protection, and legal compliance. This paper introduces a novel forensic framework for identifying the training dataset (e.g., CelebA or FFHQ) of GAN-generated images through interpretable feature analysis. By integrating spectral transforms (Fourier/DCT), color distribution metrics, and local feature descriptors (SIFT), our pipeline extracts discriminative statistical signatures embedded in synthetic outputs. Supervised classifiers (Random Forest, SVM, XGBoost) achieve 98-99% accuracy in binary classification (real vs. synthetic) and multi-class dataset attribution across diverse GAN architectures (StyleGAN, AttGAN, GDWCT, StarGAN, and StyleGAN2). Experimental results highlight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis · Digital and Cyber Forensics
MethodsSupport Vector Machine
