Facial Emotion Learning with Text-Guided Multiview Fusion via Vision-Language Model for 3D/4D Facial Expression Recognition

Muzammil Behzad

arXiv:2507.01673·cs.CV·July 3, 2025

Facial Emotion Learning with Text-Guided Multiview Fusion via Vision-Language Model for 3D/4D Facial Expression Recognition

Muzammil Behzad

PDF

Open Access

TL;DR

This paper introduces FACET-VLM, a novel vision-language framework that combines multiview facial representation learning with semantic guidance to improve 3D/4D facial expression recognition, achieving state-of-the-art results.

Contribution

The paper proposes a new multiview fusion framework with semantic guidance and consistency loss, advancing 3D/4D facial expression recognition methods.

Findings

01

Achieves state-of-the-art accuracy on multiple benchmarks.

02

Effectively captures subtle micro-expressions in 4D data.

03

Demonstrates robustness across posed and spontaneous expressions.

Abstract

Facial expression recognition (FER) in 3D and 4D domains presents a significant challenge in affective computing due to the complexity of spatial and temporal facial dynamics. Its success is crucial for advancing applications in human behavior understanding, healthcare monitoring, and human-computer interaction. In this work, we propose FACET-VLM, a vision-language framework for 3D/4D FER that integrates multiview facial representation learning with semantic guidance from natural language prompts. FACET-VLM introduces three key components: Cross-View Semantic Aggregation (CVSA) for view-consistent fusion, Multiview Text-Guided Fusion (MTGF) for semantically aligned facial emotions, and a multiview consistency loss to enforce structural coherence across views. Our model achieves state-of-the-art accuracy across multiple benchmarks, including BU-3DFE, Bosphorus, BU-4DFE, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Face recognition and analysis · Face Recognition and Perception