VerLM: Explaining Face Verification Using Natural Language

Syed Abdul Hannan; Hazim Bukhari; Thomas Cantalapiedra; Eman Ansar; Massa Baali; Rita Singh; Bhiksha Raj

arXiv:2601.01798·cs.CV·January 6, 2026

VerLM: Explaining Face Verification Using Natural Language

Syed Abdul Hannan, Hazim Bukhari, Thomas Cantalapiedra, Eman Ansar, Massa Baali, Rita Singh, Bhiksha Raj

PDF

Open Access

TL;DR

This paper presents VerLM, a novel vision-language model for face verification that not only accurately identifies matches but also provides natural language explanations for its decisions, enhancing transparency and interpretability.

Contribution

Introduces a cross-modal vision-language model for face verification that offers explicit natural language explanations, improving transparency and accuracy over existing methods.

Findings

01

Outperforms baseline face verification models

02

Provides both concise and detailed explanations

03

Enhances interpretability and reliability of face verification

Abstract

Face verification systems have seen substantial advancements; however, they often lack transparency in their decision-making processes. In this paper, we introduce an innovative Vision-Language Model (VLM) for Face Verification, which not only accurately determines if two face images depict the same individual but also explicitly explains the rationale behind its decisions. Our model is uniquely trained using two complementary explanation styles: (1) concise explanations that summarize the key factors influencing its decision, and (2) comprehensive explanations detailing the specific differences observed between the images. We adapt and enhance a state-of-the-art modeling approach originally designed for audio-based differentiation to suit visual inputs effectively. This cross-modal transfer significantly improves our model's accuracy and interpretability. The proposed VLM integrates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Face Recognition and Perception · Generative Adversarial Networks and Image Synthesis