From Intuition to Investigation: A Tool-Augmented Reasoning MLLM Framework for Generalizable Face Anti-Spoofing

Haoyuan Zhang; Keyao Wang; Guosheng Zhang; Haixiao Yue; Zhiwen Tan; Siran Peng; Tianshuo Zhang; Xiao Tan; Kunbin Chen; Wei He; Jingdong Wang; Ajian Liu; Xiangyu Zhu; Zhen Lei

arXiv:2603.01038·cs.CV·March 23, 2026

From Intuition to Investigation: A Tool-Augmented Reasoning MLLM Framework for Generalizable Face Anti-Spoofing

Haoyuan Zhang, Keyao Wang, Guosheng Zhang, Haixiao Yue, Zhiwen Tan, Siran Peng, Tianshuo Zhang, Xiao Tan, Kunbin Chen, Wei He, Jingdong Wang, Ajian Liu, Xiangyu Zhu, Zhen Lei

PDF

Open Access

TL;DR

This paper introduces TAR-FAS, a framework that enhances face anti-spoofing by integrating external visual tools with MLLMs, enabling fine-grained investigation and improving cross-domain generalization.

Contribution

It proposes a novel Chain-of-Thought with Visual Tools paradigm, a tool-augmented data annotation pipeline, and a training method that allows MLLMs to autonomously utilize visual tools for better spoof detection.

Findings

01

Achieves state-of-the-art performance on cross-domain face anti-spoofing tasks.

02

Demonstrates effective fine-grained visual investigation capabilities.

03

Constructed the large-scale ToolFAS-16K dataset for training and evaluation.

Abstract

Face recognition remains vulnerable to presentation attacks, calling for robust Face Anti-Spoofing (FAS) solutions. Recent MLLM-based FAS methods reformulate the binary classification task as the generation of brief textual descriptions to improve cross-domain generalization. However, their generalizability is still limited, as such descriptions mainly capture intuitive semantic cues (e.g., mask contours) while struggling to perceive fine-grained visual patterns. To address this limitation, we incorporate external visual tools into MLLMs to encourage deeper investigation of subtle spoof clues. Specifically, we propose the Tool-Augmented Reasoning FAS (TAR-FAS) framework, which reformulates the FAS task as a Chain-of-Thought with Visual Tools (CoT-VT) paradigm, allowing MLLMs to begin with intuitive observations and adaptively invoke external visual tools for fine-grained investigation.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiometric Identification and Security · Face recognition and analysis · Reconstructive Facial Surgery Techniques