Neural Capture of Animatable 3D Human from Monocular Video
Gusi Te, Xiu Li, Xiao Li, Jinglu Wang, Wei Hu, Yan Lu

TL;DR
This paper introduces a method to create an animatable 3D human model from a single video, enabling rendering in new poses and views by embedding surface relationships into a dynamic NeRF framework.
Contribution
It proposes a novel embedding technique based on surface geodesic neighbors to improve generalization of dynamic NeRFs to unseen poses from monocular videos.
Findings
Successfully renders humans in unseen poses and views.
Outperforms previous methods in quality and generalization.
Reduces dependency on multi-view data or precise 3D geometry.
Abstract
We present a novel paradigm of building an animatable 3D human representation from a monocular video input, such that it can be rendered in any unseen poses and views. Our method is based on a dynamic Neural Radiance Field (NeRF) rigged by a mesh-based parametric 3D human model serving as a geometry proxy. Previous methods usually rely on multi-view videos or accurate 3D geometry information as additional inputs; besides, most methods suffer from degraded quality when generalized to unseen poses. We identify that the key to generalization is a good input embedding for querying dynamic NeRF: A good input embedding should define an injective mapping in the full volumetric space, guided by surface mesh deformation under pose variation. Based on this observation, we propose to embed the input query with its relationship to local surface regions spanned by a set of geodesic nearest neighbors…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · 3D Shape Modeling and Analysis · Advanced Vision and Imaging
