FLIP: Cross-domain Face Anti-spoofing with Language Guidance

Koushik Srivatsan; Muzammal Naseer; Karthik Nandakumar

arXiv:2309.16649·cs.CV·September 29, 2023·2 cites

FLIP: Cross-domain Face Anti-spoofing with Language Guidance

Koushik Srivatsan, Muzammal Naseer, Karthik Nandakumar

PDF

Open Access 3 Repos

TL;DR

This paper introduces FLIP, a cross-domain face anti-spoofing method that leverages multimodal pre-trained vision-language models and natural language grounding to improve generalization and zero-shot transfer capabilities.

Contribution

The work demonstrates that initializing ViTs with multimodal pre-trained weights and aligning visual features with natural language descriptions enhances FAS generalization, introducing a novel multimodal contrastive learning strategy.

Findings

01

Outperforms state-of-the-art methods on standard protocols.

02

Achieves superior zero-shot transfer performance.

03

Improves robustness in low-data regimes.

Abstract

Face anti-spoofing (FAS) or presentation attack detection is an essential component of face recognition systems deployed in security-critical applications. Existing FAS methods have poor generalizability to unseen spoof types, camera sensors, and environmental conditions. Recently, vision transformer (ViT) models have been shown to be effective for the FAS task due to their ability to capture long-range dependencies among image patches. However, adaptive modules or auxiliary loss functions are often required to adapt pre-trained ViT weights learned on large-scale datasets such as ImageNet. In this work, we first show that initializing ViTs with multimodal (e.g., CLIP) pre-trained weights improves generalizability for the FAS task, which is in line with the zero-shot transfer capabilities of vision-language pre-trained (VLP) models. We then propose a novel approach for robust…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiometric Identification and Security · Face recognition and analysis · Reconstructive Facial Surgery Techniques

MethodsAttention Is All You Need · Softmax · Linear Layer · Multi-Head Attention · Residual Connection · Dense Connections · Layer Normalization · Vision Transformer · Contrastive Learning