TwoHead-SwinFPN: A Unified DL Architecture for Synthetic Manipulation, Detection and Localization in Identity Documents

Chan Naseeb; Adeel Ashraf Cheema; Hassan Sami; Tayyab Afzal; Muhammad Omair; Usman Habib

arXiv:2601.12895·cs.CV·January 21, 2026

TwoHead-SwinFPN: A Unified DL Architecture for Synthetic Manipulation, Detection and Localization in Identity Documents

Chan Naseeb, Adeel Ashraf Cheema, Hassan Sami, Tayyab Afzal, Muhammad Omair, Usman Habib

PDF

Open Access

TL;DR

This paper introduces TwoHead-SwinFPN, a deep learning model that effectively detects and localizes synthetic manipulations in identity documents, enhancing security against AI-generated forgeries.

Contribution

The paper presents a novel unified architecture combining Swin Transformer, FPN, and UNet components with multi-task learning for manipulation detection and localization in ID documents.

Findings

01

Achieves 84.31% accuracy and 90.78% AUC in classification

02

Attains 57.24% mean Dice score in localization

03

Demonstrates robustness across multiple languages and devices

Abstract

The proliferation of sophisticated generative AI models has significantly escalated the threat of synthetic manipulations in identity documents, particularly through face swapping and text inpainting attacks. This paper presents TwoHead-SwinFPN, a unified deep learning architecture that simultaneously performs binary classification and precise localization of manipulated regions in ID documents. Our approach integrates a Swin Transformer backbone with Feature Pyramid Network (FPN) and UNet-style decoder, enhanced with Convolutional Block Attention Module (CBAM) for improved feature representation. The model employs a dual-head architecture for joint optimization of detection and segmentation tasks, utilizing uncertainty-weighted multi-task learning. Extensive experiments on the FantasyIDiap dataset demonstrate superior performance with 84.31\% accuracy, 90.78\% AUC for classification,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection · Adversarial Robustness in Machine Learning