G$^2$V$^2$former: Graph Guided Video Vision Transformer for Face   Anti-Spoofing

Jingyi Yang; Zitong Yu; Xiuming Ni; Jia He; Hui Li

arXiv:2408.07675·cs.CV·February 25, 2025·2 cites

G$^2$V$^2$former: Graph Guided Video Vision Transformer for Face Anti-Spoofing

Jingyi Yang, Zitong Yu, Xiuming Ni, Jia He, Hui Li

PDF

Open Access

TL;DR

This paper introduces G$^2$V$^2$former, a novel graph-guided video vision transformer that fuses photometric and dynamic facial features for improved face anti-spoofing, capturing both spatial and temporal cues effectively.

Contribution

It proposes a new spatiotemporal attention mechanism with Kronecker temporal attention and leverages facial landmarks to enhance dynamic feature extraction in face anti-spoofing.

Findings

01

Achieves superior performance on nine benchmark datasets.

02

Effectively captures dynamic and photometric cues for spoofing detection.

03

Outperforms existing methods in various scenarios.

Abstract

In videos containing spoofed faces, we may uncover the spoofing evidence based on either photometric or dynamic abnormality, even a combination of both. Prevailing face anti-spoofing (FAS) approaches generally concentrate on the single-frame scenario, however, purely photometric-driven methods overlook the dynamic spoofing clues that may be exposed over time. This may lead FAS systems to conclude incorrect judgments, especially in cases where it is easily distinguishable in terms of dynamics but challenging to discern in terms of photometrics. To this end, we propose the Graph Guided Video Vision Transformer (G $^{2}$ V $^{2}$ former), which combines faces with facial landmarks for photometric and dynamic feature fusion. We factorize the attention into space and time, and fuse them via a spatiotemporal block. Specifically, we design a novel temporal attention called Kronecker temporal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiometric Identification and Security · Organ and Tissue Transplantation Research · User Authentication and Security Systems

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Softmax · Absolute Position Encodings · Vision Transformer