CAE-Net: Generalized Deepfake Image Detection using Convolution and Attention Mechanisms with Spatial and Frequency Domain Features
Anindya Bhattacharjee, Kaidul Islam, Kafi Anan, Ashir Intesher, Abrar Assaeem Fuad, Utsab Saha, Hafiz Imtiaz

TL;DR
CAE-Net introduces a novel ensemble deep learning framework combining spatial and frequency features with attention mechanisms, achieving high accuracy and robustness in generalized deepfake detection on imbalanced datasets.
Contribution
The paper presents CAE-Net, a new deepfake detection model that integrates multiple architectures and wavelet features, along with a multistage training strategy for class imbalance.
Findings
Achieved 94.46% accuracy and 97.60% AUC on the IEEE Signal Processing Cup 2025 dataset.
Outperformed conventional class-balancing methods in deepfake detection.
Demonstrated robustness against adversarial attacks.
Abstract
The spread of deepfakes poses significant security concerns, demanding reliable detection methods. However, diverse generation techniques and class imbalance in datasets create challenges. We propose CAE-Net, a Convolution- and Attention-based weighted Ensemble network combining spatial and frequency-domain features for effective deepfake detection. The architecture integrates EfficientNet, Data-Efficient Image Transformer (DeiT), and ConvNeXt with wavelet features to learn complementary representations. We evaluated CAE-Net on the diverse IEEE Signal Processing Cup 2025 (DF-Wild Cup) dataset, which has a 5:1 fake-to-real class imbalance. To address this, we introduce a multistage disjoint-subset training strategy, sequentially training the model on non-overlapping subsets of the fake class while retaining knowledge across stages. Our approach achieved accuracy and a …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Absolute Position Encodings · Layer Normalization · RMSProp · Squeeze-and-Excitation Block · Byte Pair Encoding · Label Smoothing · Transformer · Batch Normalization · Inverted Residual Block
