Focal-RegionFace: Generating Fine-Grained Multi-attribute Descriptions for Arbitrarily Selected Face Focal Regions

Kaiwen Zheng; Junchen Fu; Songpei Xu; Yaoqing He; Joemon M.Jose; Han Hu; Xuri Ge

arXiv:2601.00156·cs.CV·January 5, 2026

Focal-RegionFace: Generating Fine-Grained Multi-attribute Descriptions for Arbitrarily Selected Face Focal Regions

Kaiwen Zheng, Junchen Fu, Songpei Xu, Yaoqing He, Joemon M.Jose, Han Hu, Xuri Ge

PDF

Open Access

TL;DR

This paper presents Focal-RegionFace, a model that generates detailed multi-attribute descriptions for specific face regions, improving facial analysis by focusing on localized features with a new dataset and a fine-tuned vision-language approach.

Contribution

The paper introduces a new dataset and a fine-tuned vision-language model for region-specific facial attribute analysis, enabling more precise and interpretable facial state recognition.

Findings

01

Achieves state-of-the-art performance on the new benchmark.

02

Effectively recognizes multiple facial attributes simultaneously.

03

Demonstrates improved interpretability and focus on facial regions.

Abstract

In this paper, we introduce an underexplored problem in facial analysis: generating and recognizing multi-attribute natural language descriptions, containing facial action units (AUs), emotional states, and age estimation, for arbitrarily selected face regions (termed FaceFocalDesc). We argue that the system's ability to focus on individual facial areas leads to better understanding and control. To achieve this capability, we construct a new multi-attribute description dataset for arbitrarily selected face regions, providing rich region-level annotations and natural language descriptions. Further, we propose a fine-tuned vision-language model based on Qwen2.5-VL, called Focal-RegionFace for facial state analysis, which incrementally refines its focus on localized facial features through multiple progressively fine-tuning stages, resulting in interpretable age estimation, FAU and emotion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Emotion and Mood Recognition · Generative Adversarial Networks and Image Synthesis