Interpretable Face Anti-Spoofing: Enhancing Generalization with   Multimodal Large Language Models

Guosheng Zhang; Keyao Wang; Haixiao Yue; Ajian Liu; Gang Zhang; Kun; Yao; Errui Ding; Jingdong Wang

arXiv:2501.01720·cs.CV·January 27, 2025

Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models

Guosheng Zhang, Keyao Wang, Haixiao Yue, Ajian Liu, Gang Zhang, Kun, Yao, Errui Ding, Jingdong Wang

PDF

Open Access 1 Video

TL;DR

This paper introduces I-FAS, a multimodal large language model framework for face anti-spoofing that offers interpretable results and improved generalization across diverse datasets by transforming the task into a visual question answering paradigm.

Contribution

The work proposes a novel interpretable FAS framework using multimodal LLMs, with a captioning strategy, a specialized loss function, and a global visual feature alignment technique.

Findings

01

Outperforms state-of-the-art methods on multiple benchmarks

02

Achieves better generalization in cross-domain scenarios

03

Provides interpretable visual question answering outputs

Abstract

Face Anti-Spoofing (FAS) is essential for ensuring the security and reliability of facial recognition systems. Most existing FAS methods are formulated as binary classification tasks, providing confidence scores without interpretation. They exhibit limited generalization in out-of-domain scenarios, such as new environments or unseen spoofing types. In this work, we introduce a multimodal large language model (MLLM) framework for FAS, termed Interpretable Face Anti-Spoofing (I-FAS), which transforms the FAS task into an interpretable visual question answering (VQA) paradigm. Specifically, we propose a Spoof-aware Captioning and Filtering (SCF) strategy to generate high-quality captions for FAS images, enriching the model's supervision with natural language interpretations. To mitigate the impact of noisy captions during training, we develop a Lopsided Language Model (L-LM) loss function…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models· underline

Taxonomy

TopicsFace recognition and analysis · Biometric Identification and Security

MethodsAttentive Walk-Aggregating Graph Neural Network · ALIGN