3D Dense Geometry-Guided Facial Expression Synthesis by Adversarial Learning
Rumeysa Bodur, Binod Bhattarai, Tae-Kyun Kim

TL;DR
This paper introduces a novel GAN-based method for facial expression synthesis that leverages 3D dense geometry information, such as depth and surface normals, to improve manipulation accuracy without requiring paired training data.
Contribution
The authors propose a new GAN framework utilizing 3D dense geometry cues and a large-scale estimated RGB-Depth dataset, along with a confidence regulariser, to enhance facial expression synthesis.
Findings
Outperforms existing methods on AffectNet and RaFD benchmarks.
Effectively uses estimated depth information without ground truth depth maps.
Achieves significant improvements in expression manipulation quality.
Abstract
Manipulating facial expressions is a challenging task due to fine-grained shape changes produced by facial muscles and the lack of input-output pairs for supervised learning. Unlike previous methods using Generative Adversarial Networks (GAN), which rely on cycle-consistency loss or sparse geometry (landmarks) loss for expression synthesis, we propose a novel GAN framework to exploit 3D dense (depth and surface normals) information for expression manipulation. However, a large-scale dataset containing RGB images with expression annotations and their corresponding depth maps is not available. To this end, we propose to use an off-the-shelf state-of-the-art 3D reconstruction model to estimate the depth and create a large-scale RGB-Depth dataset after a manual data clean-up process. We utilise this dataset to minimise the novel depth consistency loss via adversarial learning (note we do…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition
