ViTransPAD: Video Transformer using convolution and self-attention for   Face Presentation Attack Detection

Zuheng Ming; Zitong Yu; Musab Al-Ghadi; Muriel Visani; Muhammad; MuzzamilLuqman; Jean-Christophe Burie

arXiv:2203.01562·cs.CV·March 15, 2022·1 cites

ViTransPAD: Video Transformer using convolution and self-attention for Face Presentation Attack Detection

Zuheng Ming, Zitong Yu, Musab Al-Ghadi, Muriel Visani, Muhammad, MuzzamilLuqman, Jean-Christophe Burie

PDF

Open Access

TL;DR

ViTransPAD introduces a novel video transformer architecture with multi-scale attention and convolution integration for improved face presentation attack detection, capturing both local details and long-range temporal dependencies.

Contribution

The paper proposes ViTransPAD, a new video transformer model with multi-scale self-attention and convolutional components, enhancing face PAD by learning fine-grained pixel-level discrimination.

Findings

01

Achieves superior accuracy in face PAD tasks.

02

Balances computational efficiency with detection performance.

03

Outperforms existing CNN and transformer-based methods.

Abstract

Face Presentation Attack Detection (PAD) is an important measure to prevent spoof attacks for face biometric systems. Many works based on Convolution Neural Networks (CNNs) for face PAD formulate the problem as an image-level binary classification task without considering the context. Alternatively, Vision Transformers (ViT) using self-attention to attend the context of an image become the mainstreams in face PAD. Inspired by ViT, we propose a Video-based Transformer for face PAD (ViTransPAD) with short/long-range spatio-temporal attention which can not only focus on local details with short attention within a frame but also capture long-range dependencies over frames. Instead of using coarse image patches with single-scale as in ViT, we propose the Multi-scale Multi-Head Self-Attention (MsMHSA) architecture to accommodate multi-scale patch partitions of Q, K, V feature maps to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiometric Identification and Security · Face recognition and analysis

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Absolute Position Encodings · Byte Pair Encoding · Softmax · Position-Wise Feed-Forward Layer · Residual Connection · Layer Normalization