Facial-R1: Aligning Reasoning and Recognition for Facial Emotion Analysis
Jiulong Wu, Yucheng Shen, Lingyong Yan, Haixin Sun, Deguo Xia, Jizhou Huang, Min Cao

TL;DR
Facial-R1 is a novel three-stage framework that improves facial emotion analysis by aligning reasoning with recognition, reducing hallucinations, and enhancing interpretability using minimal supervision and a new large-scale dataset.
Contribution
The paper introduces Facial-R1, a three-stage alignment framework with a new dataset, addressing hallucinated reasoning and misalignment issues in facial emotion analysis.
Findings
Achieves state-of-the-art performance on FEA benchmarks
Demonstrates strong generalization across datasets
Provides robust interpretability of emotion reasoning
Abstract
Facial Emotion Analysis (FEA) extends traditional facial emotion recognition by incorporating explainable, fine-grained reasoning. The task integrates three subtasks: emotion recognition, facial Action Unit (AU) recognition, and AU-based emotion reasoning to model affective states jointly. While recent approaches leverage Vision-Language Models (VLMs) and achieve promising results, they face two critical limitations: (1) hallucinated reasoning, where VLMs generate plausible but inaccurate explanations due to insufficient emotion-specific knowledge; and (2) misalignment between emotion reasoning and recognition, caused by fragmented connections between observed facial features and final labels. We propose Facial-R1, a three-stage alignment framework that effectively addresses both challenges with minimal supervision. First, we employ instruction fine-tuning to establish basic emotional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · Explainable Artificial Intelligence (XAI)
