Personalized Cross-Modal Emotional Correlation Learning for Speech-Preserving Facial Expression Manipulation
Tianshui Chen, Yujie Zhu, Jianman Lin, Zhijing Yang, Chunmei Qing, Feng Gao, Liang Lin

TL;DR
This paper introduces PCMECL, a novel method that enhances speech-preserving facial expression manipulation by personalizing and aligning visual-semantic correlations using cross-modal learning, addressing data scarcity issues.
Contribution
The proposed PCMECL algorithm personalizes prompts for individual expressions and aligns visual-semantic features through differencing, improving supervision in SPFEM tasks.
Findings
PCMECL outperforms existing methods across multiple datasets.
Personalized prompts capture individual expressive variations.
Feature differencing improves modality alignment and manipulation quality.
Abstract
Speech-preserving facial expression manipulation (SPFEM) aims to enhance human expressiveness without altering mouth movements tied to the original speech. A primary challenge in this domain is the scarcity of paired data, namely aligned frames of the same individual with identical speech but different expressions, which impedes direct supervision for emotional manipulation. While current Visual-Language Models (VLMs) can extract aligned visual and semantic features, making them a promising source of supervision, their direct application is limited. To this end, we propose a Personalized Cross-Modal Emotional Correlation Learning (PCMECL) algorithm that refines VLM-based supervision through two major improvements. First, standard VLMs rely on a single generic prompt for each emotion, failing to capture expressive variations among individuals. PCMECL addresses this limitation by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
