Adaptive-avg-pooling based Attention Vision Transformer for Face   Anti-spoofing

Jichen Yang; Fangfan Chen; Rohan Kumar Das; Zhengyu Zhu; Shunsi; Zhang

arXiv:2401.04953·eess.IV·January 11, 2024·ICASSP·1 cites

Adaptive-avg-pooling based Attention Vision Transformer for Face Anti-spoofing

Jichen Yang, Fangfan Chen, Rohan Kumar Das, Zhengyu Zhu, Shunsi, Zhang

PDF

Open Access

TL;DR

This paper introduces AAViT, a novel vision transformer with adaptive average pooling and attention modules, which enhances face anti-spoofing performance by better preserving useful features compared to traditional methods.

Contribution

The paper proposes a new vision transformer architecture, AAViT, that replaces average value computing with adaptive average pooling and attention modules for improved face anti-spoofing.

Findings

01

AAViT outperforms traditional vision transformers in face anti-spoofing.

02

AAViT achieves lower equal error rates on the Replay-Attack database.

03

AAViT surpasses ResNet and other models on the same dataset.

Abstract

Traditional vision transformer consists of two parts: transformer encoder and multi-layer perception (MLP). The former plays the role of feature learning to obtain better representation, while the latter plays the role of classification. Here, the MLP is constituted of two fully connected (FC) layers, average value computing, FC layer and softmax layer. However, due to the use of average value computing module, some useful information may get lost, which we plan to preserve by the use of alternative framework. In this work, we propose a novel vision transformer referred to as adaptive-avg-pooling based attention vision transformer (AAViT) that uses modules of adaptive average pooling and attention to replace the module of average value computing. We explore the proposed AAViT for the studies on face anti-spoofing using Replay-Attack database. The experiments show that the AAViT…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiometric Identification and Security · Digital Media Forensic Detection · Face recognition and analysis

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Multi-Head Attention · Attention Is All You Need · Global Average Pooling · 1x1 Convolution · Linear Layer · Kaiming Initialization · Layer Normalization · Residual Connection · Dense Connections