Texture, Shape, Order, and Relation Matter: A New Transformer Design for Sequential DeepFake Detection
Yunfei Li, Yuezun Li, Baoyuan Wu, Junyu Dong, Guopu Zhu, Siwei Lyu

TL;DR
This paper introduces TSOM, a novel Transformer architecture for sequential DeepFake detection that leverages texture, shape, and order cues, with an extended version TSOM++ that also models manipulation relations, achieving superior performance.
Contribution
The paper proposes a new Transformer design, TSOM, with four key innovations for improved sequential DeepFake detection, and extends it to TSOM++ by incorporating manipulation relation modeling.
Findings
TSOM outperforms existing methods on benchmark datasets.
The texture-aware branch effectively captures subtle manipulation traces.
Modeling manipulation relations with contrastive learning further improves detection.
Abstract
Sequential DeepFake detection is an emerging task that predicts the manipulation sequence in order. Existing methods typically formulate it as an image-to-sequence problem, employing conventional Transformer architectures. However, these methods lack dedicated design and consequently result in limited performance. As such, this paper describes a new Transformer design, called {TSOM}, by exploring three perspectives: Texture, Shape, and Order of Manipulations. Our method features four major improvements: \ding{182} we describe a new texture-aware branch that effectively captures subtle manipulation traces with a Diversiform Pixel Difference Attention module. \ding{183} Then we introduce a Multi-source Cross-attention module to seek deep correlations among spatial and sequential features, enabling effective modeling of complex manipulation traces. \ding{184} To further enhance the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection
MethodsAttention Is All You Need · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings · Concatenated Skip Connection · Dropout · Dense Connections · Label Smoothing · Residual Connection · Softmax
