FaceXBench: Evaluating Multimodal LLMs on Face Understanding

Kartik Narayan; Vibashan VS; Vishal M. Patel

arXiv:2501.10360·cs.CV·January 21, 2026

FaceXBench: Evaluating Multimodal LLMs on Face Understanding

Kartik Narayan, Vibashan VS, Vishal M. Patel

PDF

Open Access 1 Repo

TL;DR

FaceXBench is a new benchmark designed to systematically evaluate multimodal large language models on complex face understanding tasks, revealing significant room for improvement in current models.

Contribution

Introduction of FaceXBench, a comprehensive benchmark with 5,000 questions across 14 face understanding tasks for evaluating MLLMs.

Findings

01

Current MLLMs perform poorly on face understanding tasks

02

Models show varying performance across different evaluation settings

03

FaceXBench highlights the need for specialized face understanding capabilities

Abstract

Multimodal Large Language Models (MLLMs) demonstrate impressive problem-solving abilities across a wide range of tasks and domains. However, their capacity for face understanding has not been systematically studied. To address this gap, we introduce FaceXBench, a comprehensive benchmark designed to evaluate MLLMs on complex face understanding tasks. FaceXBench includes 5,000 multimodal multiple-choice questions derived from 25 public datasets and a newly created dataset, FaceXAPI. These questions cover 14 tasks across 6 broad categories, assessing MLLMs' face understanding abilities in bias and fairness, face authentication, recognition, analysis, localization and tool retrieval. Using FaceXBench, we conduct an extensive evaluation of 26 open-source MLLMs alongside 2 proprietary models, revealing the unique challenges in complex face understanding tasks. We analyze the models across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kartik-3004/facexbench
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Face recognition and analysis