Distilled Transformers with Locally Enhanced Global Representations for   Face Forgery Detection

Yaning Zhang; Qiufu Li; Zitong Yu; Linlin Shen

arXiv:2412.20156·cs.CV·December 31, 2024

Distilled Transformers with Locally Enhanced Global Representations for Face Forgery Detection

Yaning Zhang, Qiufu Li, Zitong Yu, Linlin Shen

PDF

Open Access

TL;DR

This paper introduces a novel hybrid transformer network with local and global feature enhancement, a mixture of experts, and self-distillation for improved face forgery detection, outperforming existing methods across multiple datasets.

Contribution

It proposes a distilled transformer architecture with locally-enhanced global representations, a mixture of experts, and a lightweight attention module, addressing issues like attention collapse and soft label scarcity.

Findings

01

Outperforms state-of-the-art on five deepfake datasets.

02

Effectively captures local and global forgery traces.

03

Reduces attention collapse with lightweight modules.

Abstract

Face forgery detection (FFD) is devoted to detecting the authenticity of face images. Although current CNN-based works achieve outstanding performance in FFD, they are susceptible to capturing local forgery patterns generated by various manipulation methods. Though transformer-based detectors exhibit improvements in modeling global dependencies, they are not good at exploring local forgery artifacts. Hybrid transformer-based networks are designed to capture local and global manipulated traces, but they tend to suffer from the attention collapse issue as the transformer block goes deeper. Besides, soft labels are rarely available. In this paper, we propose a distilled transformer network (DTN) to capture both rich local and global forgery traces and learn general and common representations for different forgery faces. Specifically, we design a mixture of expert (MoE) module to mine…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Biometric Identification and Security · Generative Adversarial Networks and Image Synthesis

MethodsLinear Layer · Multi-Head Attention · Layer Normalization · Softmax · Attention Is All You Need · Dense Connections · Residual Connection · Vision Transformer