InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing

Kun-Hsiang Lin; Yu-Wen Tseng; Kang-Yang Huang; Jhih-Ciang Wu; Wen-Huang Cheng

arXiv:2507.12060·cs.CV·July 31, 2025

InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing

Kun-Hsiang Lin, Yu-Wen Tseng, Kang-Yang Huang, Jhih-Ciang Wu, Wen-Huang Cheng

PDF

1 Repo

TL;DR

InstructFLIP is a novel instruction-tuned vision-language framework for face anti-spoofing that improves cross-domain generalization by leveraging textual guidance and decoupling content and style instructions, reducing training redundancy.

Contribution

The paper introduces InstructFLIP, a unified vision-language model that enhances face anti-spoofing by integrating textual instructions and a meta-domain strategy for better generalization.

Findings

01

Outperforms state-of-the-art models in accuracy.

02

Reduces training redundancy across multiple domains.

03

Effectively decouples content and style instructions.

Abstract

Face anti-spoofing (FAS) aims to construct a robust system that can withstand diverse attacks. While recent efforts have concentrated mainly on cross-domain generalization, two significant challenges persist: limited semantic understanding of attack types and training redundancy across domains. We address the first by integrating vision-language models (VLMs) to enhance the perception of visual input. For the second challenge, we employ a meta-domain strategy to learn a unified model that generalizes well across multiple domains. Our proposed InstructFLIP is a novel instruction-tuned framework that leverages VLMs to enhance generalization via textual guidance trained solely on a single domain. At its core, InstructFLIP explicitly decouples instructions into content and style components, where content-based instructions focus on the essential semantics of spoofing, and style-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kunkunlin1221/InstructFLIP
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFocus