XR-MBT: Multi-modal Full Body Tracking for XR through Self-Supervision with Learned Depth Point Cloud Registration
Denys Rozumnyi, Nadine Bertsch, Othman Sbai, Filippo Arcadu, Yuhua, Chen, Artsiom Sanakoyeu, Manoj Kumar, Catherine Herold, Robin Kips

TL;DR
This paper introduces XR-MBT, a self-supervised multi-modal full body tracking system for XR devices that leverages depth point clouds to accurately track legs and other body parts in real time.
Contribution
It presents a novel self-supervised learning approach combining depth sensing and point cloud data for full body tracking in XR, including legs, which was not previously possible.
Findings
Accurately tracks full body motions including legs in XR.
Outperforms state-of-the-art body tracking systems.
Enables real-time multi-modal pose estimation on XR devices.
Abstract
Tracking the full body motions of users in XR (AR/VR) devices is a fundamental challenge to bring a sense of authentic social presence. Due to the absence of dedicated leg sensors, currently available body tracking methods adopt a synthesis approach to generate plausible motions given a 3-point signal from the head and controller tracking. In order to enable mixed reality features, modern XR devices are capable of estimating depth information of the headset surroundings using available sensors combined with dedicated machine learning models. Such egocentric depth sensing cannot drive the body directly, as it is not registered and is incomplete due to limited field-of-view and body self-occlusions. For the first time, we propose to leverage the available depth sensing signal combined with self-supervision to learn a multi-modal pose estimation model capable of tracking full body motions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Image Segmentation Techniques · Human Pose and Action Recognition · Medical Imaging and Analysis
MethodsADaptive gradient method with the OPTimal convergence rate
