Joint Learning of Depth and Appearance for Portrait Image Animation
Xinya Ji, Gaspard Zoss, Prashanth Chandran, Lingchen Yang, Xun Cao,, Barbara Solenthaler, Derek Bradley

TL;DR
This paper introduces a diffusion-based framework that jointly learns visual appearance and depth for portrait images, enabling consistent 3D-aware applications like relighting and talking head animation.
Contribution
It presents a novel end-to-end diffusion architecture for simultaneous learning of appearance and depth, addressing the gap in co-generating 3D consistent visual and depth outputs.
Findings
Effective joint learning of appearance and depth.
Versatile adaptation to multiple downstream tasks.
Improved 3D consistency in portrait image generation.
Abstract
2D portrait animation has experienced significant advancements in recent years. Much research has utilized the prior knowledge embedded in large generative diffusion models to enhance high-quality image manipulation. However, most methods only focus on generating RGB images as output, and the co-generation of consistent visual plus 3D output remains largely under-explored. In our work, we propose to jointly learn the visual appearance and depth simultaneously in a diffusion-based portrait image generator. Our method embraces the end-to-end diffusion paradigm and introduces a new architecture suitable for learning this conditional joint distribution, consisting of a reference network and a channel-expanded diffusion backbone. Once trained, our framework can be efficiently adapted to various downstream applications, such as facial depth-to-image and image-to-depth generation, portrait…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Face recognition and analysis · Generative Adversarial Networks and Image Synthesis
MethodsDiffusion · Focus
