Unsupervised Multiview Contrastive Language-Image Joint Learning with Pseudo-Labeled Prompts Via Vision-Language Model for 3D/4D Facial Expression Recognition
Muzammil Behzad

TL;DR
This paper presents MultiviewVLM, an unsupervised vision-language model that learns multiview facial emotion representations from 3D/4D data using pseudo-labeled prompts and contrastive learning, achieving superior performance.
Contribution
The paper introduces a novel multiview contrastive learning framework with pseudo-labels and a gradient-friendly loss for scalable, unsupervised facial emotion recognition from 3D/4D data.
Findings
Outperforms existing state-of-the-art methods
Effective in real-world applications with minimal modifications
Scalable to distributed training environments
Abstract
In this paper, we introduce MultiviewVLM, a vision-language model designed for unsupervised contrastive multiview representation learning of facial emotions from 3D/4D data. Our architecture integrates pseudo-labels derived from generated textual prompts to guide implicit alignment of emotional semantics. To capture shared information across multi-views, we propose a joint embedding space that aligns multiview representations without requiring explicit supervision. We further enhance the discriminability of our model through a novel multiview contrastive learning strategy that leverages stable positive-negative pair sampling. A gradient-friendly loss function is introduced to promote smoother and more stable convergence, and the model is optimized for distributed training to ensure scalability. Extensive experiments demonstrate that MultiviewVLM outperforms existing state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBrain Tumor Detection and Classification
MethodsContrastive Learning
