Bridging the Gap: Studio-like Avatar Creation from a Monocular Phone   Capture

ShahRukh Athar; Shunsuke Saito; Zhengyu Yang; Stanislav Pidhorsky,; Chen Cao

arXiv:2407.19593·cs.CV·July 31, 2024

Bridging the Gap: Studio-like Avatar Creation from a Monocular Phone Capture

ShahRukh Athar, Shunsuke Saito, Zhengyu Yang, Stanislav Pidhorsky,, Chen Cao

PDF

Open Access

TL;DR

This paper introduces a novel method to generate studio-like, photorealistic facial textures for avatars from simple monocular phone captures by leveraging StyleGAN2 and diffusion models, significantly improving detail and lighting realism.

Contribution

It proposes a new approach that uses StyleGAN2's $W^+$ space and diffusion super-resolution to produce high-quality, studio-like textures from casual phone videos, bridging the quality gap.

Findings

01

Produces photorealistic, uniformly lit avatars from monocular captures

02

Enhances facial detail accuracy with diffusion super-resolution

03

Achieves near-studio quality textures from casual phone videos

Abstract

Creating photorealistic avatars for individuals traditionally involves extensive capture sessions with complex and expensive studio devices like the LightStage system. While recent strides in neural representations have enabled the generation of photorealistic and animatable 3D avatars from quick phone scans, they have the capture-time lighting baked-in, lack facial details and have missing regions in areas such as the back of the ears. Thus, they lag in quality compared to studio-captured avatars. In this paper, we propose a method that bridges this gap by generating studio-like illuminated texture maps from short, monocular phone captures. We do this by parameterizing the phone texture maps using the $W^{+}$ space of a StyleGAN2, enabling near-perfect reconstruction. Then, we finetune a StyleGAN2 by sampling in the $W^{+}$ parameterized space using a very small set of studio-captured…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAugmented Reality Applications · Educational Games and Gamification

MethodsHuMan(Expedia)||How do I get a human at Expedia? · Sparse Evolutionary Training · Path Length Regularization · Weight Demodulation · R1 Regularization · Convolution · StyleGAN2 · Diffusion