Personalized Cross-Modal Emotional Correlation Learning for Speech-Preserving Facial Expression Manipulation

Tianshui Chen; Yujie Zhu; Jianman Lin; Zhijing Yang; Chunmei Qing; Feng Gao; Liang Lin

arXiv:2604.25255·cs.CV·April 29, 2026

Personalized Cross-Modal Emotional Correlation Learning for Speech-Preserving Facial Expression Manipulation

Tianshui Chen, Yujie Zhu, Jianman Lin, Zhijing Yang, Chunmei Qing, Feng Gao, Liang Lin

PDF

TL;DR

This paper introduces PCMECL, a novel method that enhances speech-preserving facial expression manipulation by personalizing and aligning visual-semantic correlations using cross-modal learning, addressing data scarcity issues.

Contribution

The proposed PCMECL algorithm personalizes prompts for individual expressions and aligns visual-semantic features through differencing, improving supervision in SPFEM tasks.

Findings

01

PCMECL outperforms existing methods across multiple datasets.

02

Personalized prompts capture individual expressive variations.

03

Feature differencing improves modality alignment and manipulation quality.

Abstract

Speech-preserving facial expression manipulation (SPFEM) aims to enhance human expressiveness without altering mouth movements tied to the original speech. A primary challenge in this domain is the scarcity of paired data, namely aligned frames of the same individual with identical speech but different expressions, which impedes direct supervision for emotional manipulation. While current Visual-Language Models (VLMs) can extract aligned visual and semantic features, making them a promising source of supervision, their direct application is limited. To this end, we propose a Personalized Cross-Modal Emotional Correlation Learning (PCMECL) algorithm that refines VLM-based supervision through two major improvements. First, standard VLMs rely on a single generic prompt for each emotion, failing to capture expressive variations among individuals. PCMECL addresses this limitation by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.