Emergent Morphing Attack Detection in Open Multi-modal Large Language Models
Marija Ivanovska, Vitomir \v{S}truc

TL;DR
This paper demonstrates that open-source multimodal large language models can effectively detect face morphing attacks in a zero-shot setting, outperforming specialized models and revealing new forensic capabilities.
Contribution
First systematic zero-shot evaluation of open-source MLLMs for face morphing attack detection, showing their strong discriminative ability without fine-tuning.
Findings
MLLMs show non-trivial morphing detection ability without training.
LLaVA1.6-Mistral-7B surpasses task-specific MAD baselines by at least 23% EER.
Multimodal pretraining encodes facial inconsistencies for forensic analysis.
Abstract
Face morphing attacks threaten biometric verification, yet most morphing attack detection (MAD) systems require task-specific training and generalize poorly to unseen attack types. Meanwhile, open-source multimodal large language models (MLLMs) have demonstrated strong visual-linguistic reasoning, but their potential in biometric forensics remains underexplored. In this paper, we present the first systematic zero-shot evaluation of open-source MLLMs for single-image MAD, using publicly available weights and a standardized, reproducible protocol. Across diverse morphing techniques, many MLLMs show non-trivial discriminative ability without any fine-tuning or domain adaptation, and LLaVA1.6-Mistral-7B achieves state-of-the-art performance, surpassing highly competitive task-specific MAD baselines by at least 23% in terms of equal error rate (EER). The results indicate that multimodal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Biometric Identification and Security · Face Recognition and Perception
