VerLM: Explaining Face Verification Using Natural Language
Syed Abdul Hannan, Hazim Bukhari, Thomas Cantalapiedra, Eman Ansar, Massa Baali, Rita Singh, Bhiksha Raj

TL;DR
This paper presents VerLM, a novel vision-language model for face verification that not only accurately identifies matches but also provides natural language explanations for its decisions, enhancing transparency and interpretability.
Contribution
Introduces a cross-modal vision-language model for face verification that offers explicit natural language explanations, improving transparency and accuracy over existing methods.
Findings
Outperforms baseline face verification models
Provides both concise and detailed explanations
Enhances interpretability and reliability of face verification
Abstract
Face verification systems have seen substantial advancements; however, they often lack transparency in their decision-making processes. In this paper, we introduce an innovative Vision-Language Model (VLM) for Face Verification, which not only accurately determines if two face images depict the same individual but also explicitly explains the rationale behind its decisions. Our model is uniquely trained using two complementary explanation styles: (1) concise explanations that summarize the key factors influencing its decision, and (2) comprehensive explanations detailing the specific differences observed between the images. We adapt and enhance a state-of-the-art modeling approach originally designed for audio-based differentiation to suit visual inputs effectively. This cross-modal transfer significantly improves our model's accuracy and interpretability. The proposed VLM integrates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Face Recognition and Perception · Generative Adversarial Networks and Image Synthesis
