Capturing Head Avatar with Hand Contacts from a Monocular Video

Haonan He; Yufeng Zheng; Jie Song

arXiv:2510.17181·cs.CV·October 21, 2025

Capturing Head Avatar with Hand Contacts from a Monocular Video

Haonan He, Yufeng Zheng, Jie Song

PDF

Open Access

TL;DR

This paper introduces a novel framework for creating photorealistic 3D head avatars from monocular videos that also captures natural hand-face interactions, addressing previous limitations of ignoring such interactions.

Contribution

It proposes a joint learning approach for head avatars and hand-induced deformations, combining depth order loss, contact regularization, PCA basis for deformations, and physics-inspired contact loss.

Findings

01

Outperforms state-of-the-art surface reconstruction methods in appearance and geometry accuracy.

02

Effectively captures hand-face interactions and non-rigid deformations from monocular videos.

03

Reduces artifacts and improves physical plausibility of the reconstructed avatars.

Abstract

Photorealistic 3D head avatars are vital for telepresence, gaming, and VR. However, most methods focus solely on facial regions, ignoring natural hand-face interactions, such as a hand resting on the chin or fingers gently touching the cheek, which convey cognitive states like pondering. In this work, we present a novel framework that jointly learns detailed head avatars and the non-rigid deformations induced by hand-face interactions. There are two principal challenges in this task. First, naively tracking hand and face separately fails to capture their relative poses. To overcome this, we propose to combine depth order loss with contact regularization during pose tracking, ensuring correct spatial relationships between the face and hand. Second, no publicly available priors exist for hand-induced deformations, making them non-trivial to learn from monocular videos. To address this,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Social Robot Interaction and HRI · Human Pose and Action Recognition