FaceLLM: A Multimodal Large Language Model for Face Understanding

Hatef Otroshi Shahreza; S\'ebastien Marcel

arXiv:2507.10300·cs.CV·July 15, 2025

FaceLLM: A Multimodal Large Language Model for Face Understanding

Hatef Otroshi Shahreza, S\'ebastien Marcel

PDF

Open Access 3 Models

TL;DR

FaceLLM is a specialized multimodal large language model trained on a novel face-focused dataset, significantly improving performance on facial understanding tasks by leveraging synthetic supervision from language models.

Contribution

This work introduces FaceLLM and a new dataset, FairFaceGPT, enabling domain-specific facial understanding with weakly supervised question-answer pairs.

Findings

01

FaceLLM achieves state-of-the-art results on face-centric tasks.

02

The weakly supervised pipeline effectively generates high-quality training data.

03

Synthetic supervision enhances domain-specific multimodal model performance.

Abstract

Multimodal large language models (MLLMs) have shown remarkable performance in vision-language tasks. However, existing MLLMs are primarily trained on generic datasets, limiting their ability to reason on domain-specific visual cues such as those in facial images. In particular, tasks that require detailed understanding of facial structure, expression, emotion, and demographic features remain underexplored by MLLMs due to the lack of large-scale annotated face image-text datasets. In this work, we introduce FaceLLM, a multimodal large language model trained specifically for facial image understanding. To construct the training data, we propose a novel weakly supervised pipeline that uses ChatGPT with attribute-aware prompts to generate high-quality question-answer pairs based on images from the FairFace dataset. The resulting corpus, called FairFaceGPT, covers a diverse set of attributes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis