Unsupervised Multiview Contrastive Language-Image Joint Learning with Pseudo-Labeled Prompts Via Vision-Language Model for 3D/4D Facial Expression Recognition

Muzammil Behzad

arXiv:2505.09336·cs.CV·May 15, 2025

Unsupervised Multiview Contrastive Language-Image Joint Learning with Pseudo-Labeled Prompts Via Vision-Language Model for 3D/4D Facial Expression Recognition

Muzammil Behzad

PDF

Open Access

TL;DR

This paper presents MultiviewVLM, an unsupervised vision-language model that learns multiview facial emotion representations from 3D/4D data using pseudo-labeled prompts and contrastive learning, achieving superior performance.

Contribution

The paper introduces a novel multiview contrastive learning framework with pseudo-labels and a gradient-friendly loss for scalable, unsupervised facial emotion recognition from 3D/4D data.

Findings

01

Outperforms existing state-of-the-art methods

02

Effective in real-world applications with minimal modifications

03

Scalable to distributed training environments

Abstract

In this paper, we introduce MultiviewVLM, a vision-language model designed for unsupervised contrastive multiview representation learning of facial emotions from 3D/4D data. Our architecture integrates pseudo-labels derived from generated textual prompts to guide implicit alignment of emotional semantics. To capture shared information across multi-views, we propose a joint embedding space that aligns multiview representations without requiring explicit supervision. We further enhance the discriminability of our model through a novel multiview contrastive learning strategy that leverages stable positive-negative pair sampling. A gradient-friendly loss function is introduced to promote smoother and more stable convergence, and the model is optimized for distributed training to ensure scalability. Extensive experiments demonstrate that MultiviewVLM outperforms existing state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBrain Tumor Detection and Classification

MethodsContrastive Learning