Learning Spatial-Temporal Coherent Correlations for Speech-Preserving Facial Expression Manipulation

Tianshui Chen; Jianman Lin; Zhijing Yang; Chunmei Qing; Guangrun Wang; Liang Lin

arXiv:2604.20226·cs.CV·April 23, 2026

Learning Spatial-Temporal Coherent Correlations for Speech-Preserving Facial Expression Manipulation

Tianshui Chen, Jianman Lin, Zhijing Yang, Chunmei Qing, Guangrun Wang, Liang Lin

PDF

TL;DR

This paper introduces a novel spatial-temporal correlation learning method to improve speech-preserving facial expression manipulation by leveraging correlations across local facial regions and frames.

Contribution

It proposes the STCCL algorithm that models and utilizes spatial-temporal facial correlations as explicit supervision metrics for better expression manipulation.

Findings

01

The method effectively preserves speech content while modifying facial expressions.

02

The correlation-aware strategy improves focus on challenging facial regions.

03

Experimental results demonstrate enhanced manipulation quality with the proposed approach.

Abstract

Speech-preserving facial expression manipulation (SPFEM) aims to modify facial emotions while meticulously maintaining the mouth animation associated with spoken content. Current works depend on inaccessible paired training samples for the person, where two aligned frames exhibit the same speech content yet differ in emotional expression, limiting the SPFEM applications in real-world scenarios. In this work, we discover that speakers who convey the same content with different emotions exhibit highly correlated local facial animations in both spatial and temporal spaces, providing valuable supervision for SPFEM. To capitalize on this insight, we propose a novel spatial-temporal coherent correlation learning (STCCL) algorithm, which models the aforementioned correlations as explicit metrics and integrates the metrics to supervise manipulating facial expression and meanwhile better…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.