Harnessing Chain-of-Thought Reasoning in Multimodal Large Language Models for Face Anti-Spoofing

Honglu Zhang; Zhiqin Fang; Ningning Zhao; Saihui Hou; Long Ma; Renwang Pei; Zhaofeng He

arXiv:2506.01783·cs.CV·March 3, 2026

Harnessing Chain-of-Thought Reasoning in Multimodal Large Language Models for Face Anti-Spoofing

Honglu Zhang, Zhiqin Fang, Ningning Zhao, Saihui Hou, Long Ma, Renwang Pei, Zhaofeng He

PDF

Open Access

TL;DR

This paper introduces FaceCoT, a large-scale multimodal dataset with chain-of-thought annotations for face anti-spoofing, and proposes a novel training strategy that significantly improves model robustness and interpretability.

Contribution

It presents FaceCoT, the first comprehensive VQA dataset for FAS, and a CoT-Enhanced Progressive Learning method to leverage multimodal reasoning for better anti-spoofing performance.

Findings

01

Models trained with FaceCoT and CEPL outperform existing methods.

02

The dataset covers 14 spoofing attack types with high-quality annotations.

03

Experimental results show improved robustness and interpretability in FAS models.

Abstract

Face Anti-Spoofing (FAS) typically depends on a single visual modality when defending against presentation attacks such as print attacks, screen replays, and 3D masks, resulting in limited generalization across devices, environments, and attack types. Meanwhile, Multimodal Large Language Models (MLLMs) have recently achieved breakthroughs in image-text understanding and semantic reasoning, suggesting that integrating visual and linguistic co-inference into FAS can substantially improve both robustness and interpretability. However, the lack of a high-quality vision-language multimodal dataset has been a critical bottleneck. To address this, we introduce FaceCoT (Face Chain-of-Thought), the first large-scale Visual Question Answering (VQA) dataset tailored for FAS. FaceCoT covers 14 spoofing attack types and enriches model learning with high-quality CoT VQA annotations. Meanwhile, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiometric Identification and Security · Face recognition and analysis