Empowering Morphing Attack Detection using Interpretable Image-Text Foundation Model
Sushrut Patwardhan, Raghavendra Ramachandra, Sushma Venkatesh

TL;DR
This paper introduces a multimodal, interpretable approach for morphing attack detection in face recognition, leveraging CLIP's zero-shot capabilities to generalize detection and generate relevant textual descriptions.
Contribution
It presents a novel multimodal framework that combines image analysis with textual prompts for interpretable, zero-shot morphing attack detection, evaluated across multiple datasets and morphing techniques.
Findings
Zero-shot detection achieves high generalization across morphing techniques.
Textual prompts improve interpretability and detection accuracy.
Framework outperforms state-of-the-art methods in diverse scenarios.
Abstract
Morphing attack detection has become an essential component of face recognition systems for ensuring a reliable verification scenario. In this paper, we present a multimodal learning approach that can provide a textual description of morphing attack detection. We first show that zero-shot evaluation of the proposed framework using Contrastive Language-Image Pretraining (CLIP) can yield not only generalizable morphing attack detection, but also predict the most relevant text snippet. We present an extensive analysis of ten different textual prompts that include both short and long textual prompts. These prompts are engineered by considering the human understandable textual snippet. Extensive experiments were performed on a face morphing dataset that was developed using a publicly available face biometric dataset. We present an evaluation of SOTA pre-trained neural networks together with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
