Texture, Shape, Order, and Relation Matter: A New Transformer Design for Sequential DeepFake Detection

Yunfei Li; Yuezun Li; Baoyuan Wu; Junyu Dong; Guopu Zhu; Siwei Lyu

arXiv:2404.13873·cs.CV·July 30, 2025

Texture, Shape, Order, and Relation Matter: A New Transformer Design for Sequential DeepFake Detection

Yunfei Li, Yuezun Li, Baoyuan Wu, Junyu Dong, Guopu Zhu, Siwei Lyu

PDF

Open Access

TL;DR

This paper introduces TSOM, a novel Transformer architecture for sequential DeepFake detection that leverages texture, shape, and order cues, with an extended version TSOM++ that also models manipulation relations, achieving superior performance.

Contribution

The paper proposes a new Transformer design, TSOM, with four key innovations for improved sequential DeepFake detection, and extends it to TSOM++ by incorporating manipulation relation modeling.

Findings

01

TSOM outperforms existing methods on benchmark datasets.

02

The texture-aware branch effectively captures subtle manipulation traces.

03

Modeling manipulation relations with contrastive learning further improves detection.

Abstract

Sequential DeepFake detection is an emerging task that predicts the manipulation sequence in order. Existing methods typically formulate it as an image-to-sequence problem, employing conventional Transformer architectures. However, these methods lack dedicated design and consequently result in limited performance. As such, this paper describes a new Transformer design, called {TSOM}, by exploring three perspectives: Texture, Shape, and Order of Manipulations. Our method features four major improvements: \ding{182} we describe a new texture-aware branch that effectively captures subtle manipulation traces with a Diversiform Pixel Difference Attention module. \ding{183} Then we introduce a Multi-source Cross-attention module to seek deep correlations among spatial and sequential features, enabling effective modeling of complex manipulation traces. \ding{184} To further enhance the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndustrial Vision Systems and Defect Detection

MethodsAttention Is All You Need · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings · Concatenated Skip Connection · Dropout · Dense Connections · Label Smoothing · Residual Connection · Softmax