Foundation versus Domain-specific Models: Performance Comparison, Fusion, and Explainability in Face Recognition

Redwan Sony; Parisa Farmanifard; Arun Ross; Anil K. Jain

arXiv:2507.03541·cs.CV·August 12, 2025

Foundation versus Domain-specific Models: Performance Comparison, Fusion, and Explainability in Face Recognition

Redwan Sony, Parisa Farmanifard, Arun Ross, Anil K. Jain

PDF

Open Access

TL;DR

This paper compares foundation models and domain-specific face recognition models, showing that while domain-specific models outperform zero-shot foundation models, combining them enhances accuracy and explainability in face recognition tasks.

Contribution

It provides a comprehensive performance comparison, demonstrates the benefits of fusion, and highlights the explainability capabilities of foundation models in face recognition.

Findings

01

Domain-specific models outperform foundation models on benchmark datasets.

02

Fusion of foundation and domain-specific models improves accuracy at low false match rates.

03

Foundation models like GPT-4o offer explainability and can resolve low-confidence decisions.

Abstract

In this paper, we address the following question: How do generic foundation models (e.g., CLIP, BLIP, GPT-4o, Grok-4) compare against a domain-specific face recognition model (viz., AdaFace or ArcFace) on the face recognition task? Through a series of experiments involving several foundation models and benchmark datasets, we report the following findings: (a) In all face benchmark datasets considered, domain-specific models outperformed zero-shot foundation models. (b) The performance of zero-shot generic foundation models improved on over-segmented face images compared to tightly cropped faces, thereby suggesting the importance of contextual clues. (c) A simple score-level fusion of a foundation model with a domain-specific face recognition model improved the accuracy at low false match rates. (d) Foundation models, such as GPT-4o and Grok-4, are able to provide explainability to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques