Cross-Modal Emotion Transfer for Emotion Editing in Talking Face Video

Chanhyuk Choi; Taesoo Kim; Donggyu Lee; Siyeol Jung; Taehwan Kim

arXiv:2604.07786·cs.CV·April 20, 2026

Cross-Modal Emotion Transfer for Emotion Editing in Talking Face Video

Chanhyuk Choi, Taesoo Kim, Donggyu Lee, Siyeol Jung, Taehwan Kim

PDF

2 Repos 1 Models

TL;DR

This paper introduces C-MET, a cross-modal emotion transfer method that enhances emotion expressiveness in talking face videos by modeling emotion semantic vectors across speech and visual modalities.

Contribution

It proposes a novel cross-modal approach using large-scale pretrained encoders to better transfer extended and nuanced emotions in talking face synthesis.

Findings

01

Improves emotion accuracy by 14% over state-of-the-art methods.

02

Generates expressive talking face videos for unseen extended emotions.

03

Demonstrates effectiveness on MEAD and CREMA-D datasets.

Abstract

Talking face generation has gained significant attention as a core application of generative models. To enhance the expressiveness and realism of synthesized videos, emotion editing in talking face video plays a crucial role. However, existing approaches often limit expressive flexibility and struggle to generate extended emotions. Label-based methods represent emotions with discrete categories, which fail to capture a wide range of emotions. Audio-based methods can leverage emotionally rich speech signals - and even benefit from expressive text-to-speech (TTS) synthesis - but they fail to express the target emotions because emotions and linguistic contents are entangled in emotional speeches. Images-based methods, on the other hand, rely on target reference images to guide emotion transfer, yet they require high-quality frontal views and face challenges in acquiring reference data for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
coldhyuk/C-MET
model· 4 dl
4 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.