TL;DR
UniShield introduces a knowledge-grounded multimodal reasoning framework for unified face attack detection, leveraging a structured attack knowledge graph and instruction tuning to enhance detection accuracy and reasoning reliability.
Contribution
The paper presents a novel framework combining a face attack knowledge graph with multimodal reasoning and a graph-consistent optimization to improve face attack detection.
Findings
Achieves high accuracy and low error rates on a new multimodal UAD benchmark.
Effectively links attack categories to visual cues for better reasoning.
Outperforms baseline methods in detection and explanation quality.
Abstract
Unified face attack detection (UAD) requires recognizing physical spoofing and digital forgery within a shared decision space, yet existing discriminative or prompt-based methods largely rely on appearance correlations and provide limited evidence-grounded reasoning. We propose UniShield, a knowledge-grounded multimodal reasoning framework for unified face attack defense. UniShield constructs a Face Attack Knowledge Graph (FAKG) that links attack categories to diagnostic visual cues and attack-conditioned relations, and uses it to synthesize 52,025 FAKG-QA examples for Attack-Graph Instruction Tuning (AGIT). To improve rationale consistency, we further introduce Graph-Consistent Reasoning Optimization (GCRO), a GRPO-based objective with a KG-consistency reward that encourages generated rationales to match graph-supported cues while penalizing incompatible claims. Experiments on our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
