CEM-Net: Cross-Emotion Memory Network for Emotional Talking Face Generation

Kangyi Wu; Pengna Li; Jingwen Fu; Yang Wu; Yuhan Liu; Sanping Zhou; Jinjun Wang

arXiv:2508.12368·cs.MM·August 19, 2025

CEM-Net: Cross-Emotion Memory Network for Emotional Talking Face Generation

Kangyi Wu, Pengna Li, Jingwen Fu, Yang Wu, Yuhan Liu, Sanping Zhou, Jinjun Wang

PDF

Open Access

TL;DR

CEM-Net is a novel framework that improves emotional talking face generation by enhancing audio emotion and bridging emotion gaps from reference images, resulting in more accurate and natural lip-synced videos.

Contribution

The paper introduces CEM-Net, combining an Audio Emotion Enhancement module and an Emotion Bridging Memory module to address emotion inconsistency issues in talking face generation.

Findings

01

CEM-Net achieves higher emotion accuracy in generated videos.

02

The method produces more natural and expressive talking faces.

03

Extensive experiments validate the effectiveness of the proposed modules.

Abstract

Emotional talking face generation aims to animate a human face in given reference images and generate a talking video that matches the content and emotion of driving audio. However, existing methods neglect that reference images may have a strong emotion that conflicts with the audio emotion, leading to severe emotion inaccuracy and distorted generated results. To tackle the issue, we introduce a cross-emotion memory network(CEM-Net), designed to generate emotional talking faces aligned with the driving audio when reference images exhibit strong emotion. Specifically, an Audio Emotion Enhancement module(AEE) is first devised with the cross-reconstruction training strategy to enhance audio emotion, overcoming the disruption from reference image emotion. Secondly, since reference images cannot provide sufficient facial motion information of the speaker under audio emotion, an Emotion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Image Retrieval and Classification Techniques