Liveness Detection in Computer Vision: Transformer-based Self-Supervised Learning for Face Anti-Spoofing
Arman Keresh, Pakizar Shamoi

TL;DR
This paper demonstrates that a Vision Transformer model, fine-tuned with self-supervised learning, outperforms traditional CNNs in face anti-spoofing tasks, enhancing biometric security against spoofing attacks.
Contribution
It introduces a transformer-based approach using DINO for self-supervised learning, showing improved accuracy and robustness over CNN models in face anti-spoofing.
Findings
ViT with DINO outperforms CNN in accuracy
Transformer models better detect complex spoofing cues
Model validated on standard and new datasets
Abstract
Face recognition systems are increasingly used in biometric security for convenience and effectiveness. However, they remain vulnerable to spoofing attacks, where attackers use photos, videos, or masks to impersonate legitimate users. This research addresses these vulnerabilities by exploring the Vision Transformer (ViT) architecture, fine-tuned with the DINO framework. The DINO framework facilitates self-supervised learning, enabling the model to learn distinguishing features from unlabeled data. We compared the performance of the proposed fine-tuned ViT model using the DINO framework against a traditional CNN model, EfficientNet b2, on the face anti-spoofing task. Numerous tests on standard datasets show that the ViT model performs better than the CNN model in terms of accuracy and resistance to different spoofing methods. Additionally, we collected our own dataset from a biometric…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiometric Identification and Security · Face recognition and analysis
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Pointwise Convolution · Depthwise Convolution · Batch Normalization · Depthwise Separable Convolution · RMSProp · Linear Layer · Inverted Residual Block · 1x1 Convolution · Squeeze-and-Excitation Block
