FaceXBench: Evaluating Multimodal LLMs on Face Understanding
Kartik Narayan, Vibashan VS, Vishal M. Patel

TL;DR
FaceXBench is a new benchmark designed to systematically evaluate multimodal large language models on complex face understanding tasks, revealing significant room for improvement in current models.
Contribution
Introduction of FaceXBench, a comprehensive benchmark with 5,000 questions across 14 face understanding tasks for evaluating MLLMs.
Findings
Current MLLMs perform poorly on face understanding tasks
Models show varying performance across different evaluation settings
FaceXBench highlights the need for specialized face understanding capabilities
Abstract
Multimodal Large Language Models (MLLMs) demonstrate impressive problem-solving abilities across a wide range of tasks and domains. However, their capacity for face understanding has not been systematically studied. To address this gap, we introduce FaceXBench, a comprehensive benchmark designed to evaluate MLLMs on complex face understanding tasks. FaceXBench includes 5,000 multimodal multiple-choice questions derived from 25 public datasets and a newly created dataset, FaceXAPI. These questions cover 14 tasks across 6 broad categories, assessing MLLMs' face understanding abilities in bias and fairness, face authentication, recognition, analysis, localization and tool retrieval. Using FaceXBench, we conduct an extensive evaluation of 26 open-source MLLMs alongside 2 proprietary models, revealing the unique challenges in complex face understanding tasks. We analyze the models across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Face recognition and analysis
