From Intuition to Investigation: A Tool-Augmented Reasoning MLLM Framework for Generalizable Face Anti-Spoofing
Haoyuan Zhang, Keyao Wang, Guosheng Zhang, Haixiao Yue, Zhiwen Tan, Siran Peng, Tianshuo Zhang, Xiao Tan, Kunbin Chen, Wei He, Jingdong Wang, Ajian Liu, Xiangyu Zhu, Zhen Lei

TL;DR
This paper introduces TAR-FAS, a framework that enhances face anti-spoofing by integrating external visual tools with MLLMs, enabling fine-grained investigation and improving cross-domain generalization.
Contribution
It proposes a novel Chain-of-Thought with Visual Tools paradigm, a tool-augmented data annotation pipeline, and a training method that allows MLLMs to autonomously utilize visual tools for better spoof detection.
Findings
Achieves state-of-the-art performance on cross-domain face anti-spoofing tasks.
Demonstrates effective fine-grained visual investigation capabilities.
Constructed the large-scale ToolFAS-16K dataset for training and evaluation.
Abstract
Face recognition remains vulnerable to presentation attacks, calling for robust Face Anti-Spoofing (FAS) solutions. Recent MLLM-based FAS methods reformulate the binary classification task as the generation of brief textual descriptions to improve cross-domain generalization. However, their generalizability is still limited, as such descriptions mainly capture intuitive semantic cues (e.g., mask contours) while struggling to perceive fine-grained visual patterns. To address this limitation, we incorporate external visual tools into MLLMs to encourage deeper investigation of subtle spoof clues. Specifically, we propose the Tool-Augmented Reasoning FAS (TAR-FAS) framework, which reformulates the FAS task as a Chain-of-Thought with Visual Tools (CoT-VT) paradigm, allowing MLLMs to begin with intuitive observations and adaptively invoke external visual tools for fine-grained investigation.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiometric Identification and Security · Face recognition and analysis · Reconstructive Facial Surgery Techniques
