Universal Facial Encoding of Codec Avatars from VR Headsets
Shaojie Bai, Te-Li Wang, Chenghui Li, Akshay Venkatesh, Tomas Simon,, Chen Cao, Gabriel Schwartz, Ryan Wrench, Jason Saragih, Yaser Sheikh, Shih-En, Wei

TL;DR
This paper introduces a real-time, self-supervised facial animation system for VR avatars that generalizes well to unseen users despite challenging conditions like oblique views and environmental variability.
Contribution
It presents a novel self-supervised learning method with a lightweight calibration and improved parameterization for robust, real-time facial animation from VR headset cameras.
Findings
Significant improvements over prior face-encoding methods.
Robust real-time animation for unseen users.
Effective handling of environmental and view variability.
Abstract
Faithful real-time facial animation is essential for avatar-mediated telepresence in Virtual Reality (VR). To emulate authentic communication, avatar animation needs to be efficient and accurate: able to capture both extreme and subtle expressions within a few milliseconds to sustain the rhythm of natural conversations. The oblique and incomplete views of the face, variability in the donning of headsets, and illumination variation due to the environment are some of the unique challenges in generalization to unseen faces. In this paper, we present a method that can animate a photorealistic avatar in realtime from head-mounted cameras (HMCs) on a consumer VR headset. We present a self-supervised learning approach, based on a cross-view reconstruction objective, that enables generalization to unseen users. We present a lightweight expression calibration mechanism that increases accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
