Distilled Transformers with Locally Enhanced Global Representations for Face Forgery Detection
Yaning Zhang, Qiufu Li, Zitong Yu, Linlin Shen

TL;DR
This paper introduces a novel hybrid transformer network with local and global feature enhancement, a mixture of experts, and self-distillation for improved face forgery detection, outperforming existing methods across multiple datasets.
Contribution
It proposes a distilled transformer architecture with locally-enhanced global representations, a mixture of experts, and a lightweight attention module, addressing issues like attention collapse and soft label scarcity.
Findings
Outperforms state-of-the-art on five deepfake datasets.
Effectively captures local and global forgery traces.
Reduces attention collapse with lightweight modules.
Abstract
Face forgery detection (FFD) is devoted to detecting the authenticity of face images. Although current CNN-based works achieve outstanding performance in FFD, they are susceptible to capturing local forgery patterns generated by various manipulation methods. Though transformer-based detectors exhibit improvements in modeling global dependencies, they are not good at exploring local forgery artifacts. Hybrid transformer-based networks are designed to capture local and global manipulated traces, but they tend to suffer from the attention collapse issue as the transformer block goes deeper. Besides, soft labels are rarely available. In this paper, we propose a distilled transformer network (DTN) to capture both rich local and global forgery traces and learn general and common representations for different forgery faces. Specifically, we design a mixture of expert (MoE) module to mine…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Biometric Identification and Security · Generative Adversarial Networks and Image Synthesis
MethodsLinear Layer · Multi-Head Attention · Layer Normalization · Softmax · Attention Is All You Need · Dense Connections · Residual Connection · Vision Transformer
