Realistic Full-Body Tracking from Sparse Observations via Joint-Level Modeling
Xiaozheng Zheng, Zhuo Su, Chao Wen, Zhou Xue, Xiaojie Jin

TL;DR
This paper introduces a two-stage neural network framework that accurately reconstructs full-body 3D motions from sparse head and hand tracking data, enhancing VR/AR avatar realism.
Contribution
The novel joint-level modeling approach combined with transformer-based spatiotemporal analysis improves full-body motion estimation from limited signals.
Findings
Outperforms existing methods in accuracy and smoothness
Validated on AMASS dataset and real-captured data
Effective joint-level feature utilization
Abstract
To bridge the physical and virtual worlds for rapidly developed VR/AR applications, the ability to realistically drive 3D full-body avatars is of great significance. Although real-time body tracking with only the head-mounted displays (HMDs) and hand controllers is heavily under-constrained, a carefully designed end-to-end neural network is of great potential to solve the problem by learning from large-scale motion data. To this end, we propose a two-stage framework that can obtain accurate and smooth full-body motions with the three tracking signals of head and hands only. Our framework explicitly models the joint-level features in the first stage and utilizes them as spatiotemporal tokens for alternating spatial and temporal transformer blocks to capture joint-level correlations in the second stage. Furthermore, we design a set of loss terms to constrain the task of a high degree of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Realistic Full-Body Tracking from Sparse Observations via Joint-Level Modeling· youtube
Taxonomy
TopicsSurgical Simulation and Training · Virtual Reality Applications and Impacts · Human Pose and Action Recognition
