CEM-Net: Cross-Emotion Memory Network for Emotional Talking Face Generation
Kangyi Wu, Pengna Li, Jingwen Fu, Yang Wu, Yuhan Liu, Sanping Zhou, Jinjun Wang

TL;DR
CEM-Net is a novel framework that improves emotional talking face generation by enhancing audio emotion and bridging emotion gaps from reference images, resulting in more accurate and natural lip-synced videos.
Contribution
The paper introduces CEM-Net, combining an Audio Emotion Enhancement module and an Emotion Bridging Memory module to address emotion inconsistency issues in talking face generation.
Findings
CEM-Net achieves higher emotion accuracy in generated videos.
The method produces more natural and expressive talking faces.
Extensive experiments validate the effectiveness of the proposed modules.
Abstract
Emotional talking face generation aims to animate a human face in given reference images and generate a talking video that matches the content and emotion of driving audio. However, existing methods neglect that reference images may have a strong emotion that conflicts with the audio emotion, leading to severe emotion inaccuracy and distorted generated results. To tackle the issue, we introduce a cross-emotion memory network(CEM-Net), designed to generate emotional talking faces aligned with the driving audio when reference images exhibit strong emotion. Specifically, an Audio Emotion Enhancement module(AEE) is first devised with the cross-reconstruction training strategy to enhance audio emotion, overcoming the disruption from reference image emotion. Secondly, since reference images cannot provide sufficient facial motion information of the speaker under audio emotion, an Emotion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Image Retrieval and Classification Techniques
