Capturing Head Avatar with Hand Contacts from a Monocular Video
Haonan He, Yufeng Zheng, Jie Song

TL;DR
This paper introduces a novel framework for creating photorealistic 3D head avatars from monocular videos that also captures natural hand-face interactions, addressing previous limitations of ignoring such interactions.
Contribution
It proposes a joint learning approach for head avatars and hand-induced deformations, combining depth order loss, contact regularization, PCA basis for deformations, and physics-inspired contact loss.
Findings
Outperforms state-of-the-art surface reconstruction methods in appearance and geometry accuracy.
Effectively captures hand-face interactions and non-rigid deformations from monocular videos.
Reduces artifacts and improves physical plausibility of the reconstructed avatars.
Abstract
Photorealistic 3D head avatars are vital for telepresence, gaming, and VR. However, most methods focus solely on facial regions, ignoring natural hand-face interactions, such as a hand resting on the chin or fingers gently touching the cheek, which convey cognitive states like pondering. In this work, we present a novel framework that jointly learns detailed head avatars and the non-rigid deformations induced by hand-face interactions. There are two principal challenges in this task. First, naively tracking hand and face separately fails to capture their relative poses. To overcome this, we propose to combine depth order loss with contact regularization during pose tracking, ensuring correct spatial relationships between the face and hand. Second, no publicly available priors exist for hand-induced deformations, making them non-trivial to learn from monocular videos. To address this,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Social Robot Interaction and HRI · Human Pose and Action Recognition
