In-context Learning of Vision Language Models for Detection of Physical and Digital Attacks against Face Recognition Systems

Lazaro Janier Gonzalez-Soler; Maciej Salwowski; Christoph Busch

arXiv:2507.15285·cs.CV·July 22, 2025

In-context Learning of Vision Language Models for Detection of Physical and Digital Attacks against Face Recognition Systems

Lazaro Janier Gonzalez-Soler, Maciej Salwowski, Christoph Busch

PDF

TL;DR

This paper explores using Vision Language Models with in-context learning to detect physical and digital attacks on face recognition systems, offering a resource-efficient alternative to traditional deep learning methods.

Contribution

It introduces the first systematic framework for evaluating VLMs in security scenarios and demonstrates their competitive performance in attack detection.

Findings

01

VLMs outperform some CNNs without extensive training

02

Framework shows strong generalization in attack detection

03

Open-source models are effectively utilized

Abstract

Recent advances in biometric systems have significantly improved the detection and prevention of fraudulent activities. However, as detection methods improve, attack techniques become increasingly sophisticated. Attacks on face recognition systems can be broadly divided into physical and digital approaches. Traditionally, deep learning models have been the primary defence against such attacks. While these models perform exceptionally well in scenarios for which they have been trained, they often struggle to adapt to different types of attacks or varying environmental conditions. These subsystems require substantial amounts of training data to achieve reliable performance, yet biometric data collection faces significant challenges, including privacy concerns and the logistical difficulties of capturing diverse attack scenarios under controlled conditions. This work investigates the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.