CAE-Net: Generalized Deepfake Image Detection using Convolution and Attention Mechanisms with Spatial and Frequency Domain Features

Anindya Bhattacharjee; Kaidul Islam; Kafi Anan; Ashir Intesher; Abrar Assaeem Fuad; Utsab Saha; Hafiz Imtiaz

arXiv:2502.10682·cs.CV·December 29, 2025

CAE-Net: Generalized Deepfake Image Detection using Convolution and Attention Mechanisms with Spatial and Frequency Domain Features

Anindya Bhattacharjee, Kaidul Islam, Kafi Anan, Ashir Intesher, Abrar Assaeem Fuad, Utsab Saha, Hafiz Imtiaz

PDF

TL;DR

CAE-Net introduces a novel ensemble deep learning framework combining spatial and frequency features with attention mechanisms, achieving high accuracy and robustness in generalized deepfake detection on imbalanced datasets.

Contribution

The paper presents CAE-Net, a new deepfake detection model that integrates multiple architectures and wavelet features, along with a multistage training strategy for class imbalance.

Findings

01

Achieved 94.46% accuracy and 97.60% AUC on the IEEE Signal Processing Cup 2025 dataset.

02

Outperformed conventional class-balancing methods in deepfake detection.

03

Demonstrated robustness against adversarial attacks.

Abstract

The spread of deepfakes poses significant security concerns, demanding reliable detection methods. However, diverse generation techniques and class imbalance in datasets create challenges. We propose CAE-Net, a Convolution- and Attention-based weighted Ensemble network combining spatial and frequency-domain features for effective deepfake detection. The architecture integrates EfficientNet, Data-Efficient Image Transformer (DeiT), and ConvNeXt with wavelet features to learn complementary representations. We evaluated CAE-Net on the diverse IEEE Signal Processing Cup 2025 (DF-Wild Cup) dataset, which has a 5:1 fake-to-real class imbalance. To address this, we introduce a multistage disjoint-subset training strategy, sequentially training the model on non-overlapping subsets of the fake class while retaining knowledge across stages. Our approach achieved $94.46%$ accuracy and a $97.60%$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Absolute Position Encodings · Layer Normalization · RMSProp · Squeeze-and-Excitation Block · Byte Pair Encoding · Label Smoothing · Transformer · Batch Normalization · Inverted Residual Block