TL;DR
SPARK-IL is a spectral retrieval-augmented deepfake detection framework that leverages frequency-domain signatures and incremental learning to improve generalization across diverse generative models.
Contribution
It introduces a novel spectral analysis and retrieval-based approach combined with incremental learning for robust deepfake detection.
Findings
Achieves 94.6% mean accuracy on UniversalFakeDetect benchmark.
Effectively generalizes to unseen generative models.
Utilizes frequency-domain signatures for improved detection robustness.
Abstract
Detecting AI-generated images remains a significant challenge because detectors trained on specific generators often fail to generalize to unseen models; however, while pixel-level artifacts vary across models, frequency-domain signatures exhibit greater consistency, providing a promising foundation for cross-generator detection. To address this, we propose SPARK-IL, a retrieval-augmented framework that combines dual-path spectral analysis with incremental learning by utilizing a partially frozen ViT-L/14 encoder for semantic representations alongside a parallel path for raw RGB pixel embeddings. Both paths undergo multi-band Fourier decomposition into four frequency bands, which are individually processed by Kolmogorov-Arnold Networks (KAN) with mixture-of-experts for band-specific transformations before the resulting spectral embeddings are fused via cross-attention with residual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
