LaCoVL-FER: Landmark-Guided Contrastive Learning Network with Vision-Language Enhancement for Facial Expression Recognition

Jiaxin Wang; Muwei Jian; Hui Yu; Junyu Dong; and Yifan Xia

arXiv:2605.19821·cs.CV·May 20, 2026

LaCoVL-FER: Landmark-Guided Contrastive Learning Network with Vision-Language Enhancement for Facial Expression Recognition

Jiaxin Wang, Muwei Jian, Hui Yu, Junyu Dong, and Yifan Xia

PDF

1 Repo

TL;DR

LaCoVL-FER is a novel facial expression recognition framework that combines landmark-guided geometric features with vision-language models to improve robustness and accuracy in complex real-world scenarios.

Contribution

It introduces a landmark-guided adaptive encoder and a vision-language enhancement strategy to effectively fuse geometric and semantic priors for FER.

Findings

01

Outperforms state-of-the-art on RAF-DB, FERPlus, and AffectNet datasets.

02

Effectively focuses on key facial regions and suppresses noise.

03

Enhances generalization and robustness of FER models.

Abstract

Facial Expression Recognition (FER) in the wild is still challenging due to uncontrolled variations in pose, occlusion, and illumination. Most existing attention-based methods primarily rely on visual appearance cues, suffering from attention redundancy and instability, which limits their performance in complex scenarios. To address these issues, we propose a novel landmark-guided contrastive learning network with vision-language enhancement for FER (LaCoVL-FER), which integrates geometric priors from facial landmarks and semantic priors from a vision-language model. Specifically, a Landmark-Guided Adaptive Encoder (LGAE) is designed to introduce geometric priors through a Bi-branch Gated Cross Attention (BGCA) mechanism, which achieves adaptive fusion of landmark-based geometric and visual appearance features to produce expression-relevant features, thereby focusing on key facial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ylin06804/LaCoVL-FER
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.