Generalist Multimodal LLMs Gain Biometric Expertise via Human Salience
Jacob Piland, Byron Dowling, Christopher Sweet, and Adam Czajka

TL;DR
This paper demonstrates that general-purpose multimodal large language models, when augmented with human expert knowledge, can effectively perform iris presentation attack detection within privacy constraints, outperforming specialized models and human examiners.
Contribution
It shows that pre-trained vision transformers in MLLMs can cluster iris attack types and, with structured prompts, resolve ambiguities, enabling effective iris PAD without sharing biometric data.
Findings
Gemini with expert prompts outperforms CNN baseline and human examiners.
Locally-deployable Llama achieves near-human performance.
Pre-trained vision transformers inherently cluster attack types.
Abstract
Iris presentation attack detection (PAD) is critical for secure biometric deployments, yet developing specialized models faces significant practical barriers: collecting data representing future unknown attacks is impossible, and collecting diverse-enough data, yet still limited in terms of its predictive power, is expensive. Additionally, sharing biometric data raises privacy concerns. Due to rapid emergence of new attack vectors demanding adaptable solutions, we thus investigate in this paper whether general-purpose multimodal large language models (MLLMs) can perform iris PAD when augmented with human expert knowledge, operating under strict privacy constraints that prohibit sending biometric data to public cloud MLLM services. Through analysis of vision encoder embeddings applied to our dataset, we demonstrate that pre-trained vision transformers in MLLMs inherently cluster many…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiometric Identification and Security · Face recognition and analysis · Adversarial Robustness in Machine Learning
