XR-MBT: Multi-modal Full Body Tracking for XR through Self-Supervision   with Learned Depth Point Cloud Registration

Denys Rozumnyi; Nadine Bertsch; Othman Sbai; Filippo Arcadu; Yuhua; Chen; Artsiom Sanakoyeu; Manoj Kumar; Catherine Herold; Robin Kips

arXiv:2411.18377·cs.CV·November 28, 2024

XR-MBT: Multi-modal Full Body Tracking for XR through Self-Supervision with Learned Depth Point Cloud Registration

Denys Rozumnyi, Nadine Bertsch, Othman Sbai, Filippo Arcadu, Yuhua, Chen, Artsiom Sanakoyeu, Manoj Kumar, Catherine Herold, Robin Kips

PDF

Open Access

TL;DR

This paper introduces XR-MBT, a self-supervised multi-modal full body tracking system for XR devices that leverages depth point clouds to accurately track legs and other body parts in real time.

Contribution

It presents a novel self-supervised learning approach combining depth sensing and point cloud data for full body tracking in XR, including legs, which was not previously possible.

Findings

01

Accurately tracks full body motions including legs in XR.

02

Outperforms state-of-the-art body tracking systems.

03

Enables real-time multi-modal pose estimation on XR devices.

Abstract

Tracking the full body motions of users in XR (AR/VR) devices is a fundamental challenge to bring a sense of authentic social presence. Due to the absence of dedicated leg sensors, currently available body tracking methods adopt a synthesis approach to generate plausible motions given a 3-point signal from the head and controller tracking. In order to enable mixed reality features, modern XR devices are capable of estimating depth information of the headset surroundings using available sensors combined with dedicated machine learning models. Such egocentric depth sensing cannot drive the body directly, as it is not registered and is incomplete due to limited field-of-view and body self-occlusions. For the first time, we propose to leverage the available depth sensing signal combined with self-supervision to learn a multi-modal pose estimation model capable of tracking full body motions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical Image Segmentation Techniques · Human Pose and Action Recognition · Medical Imaging and Analysis

MethodsADaptive gradient method with the OPTimal convergence rate