TL;DR
This paper introduces SelfPose, a novel method for egocentric 3D body pose estimation from head-mounted camera images, utilizing a multi-branch decoder and synthetic data to improve accuracy and generalization.
Contribution
The paper presents a new encoder-decoder architecture with a multi-branch decoder and a large synthetic dataset for egocentric 3D pose estimation, achieving state-of-the-art results.
Findings
Significant accuracy improvements over existing egocentric methods.
High generalization from synthetic to real-world data.
Competitive performance on Human3.6M benchmark.
Abstract
We present a solution to egocentric 3D body pose estimation from monocular images captured from downward looking fish-eye cameras installed on the rim of a head mounted VR device. This unusual viewpoint leads to images with unique visual appearance, with severe self-occlusions and perspective distortions that result in drastic differences in resolution between lower and upper body. We propose an encoder-decoder architecture with a novel multi-branch decoder designed to account for the varying uncertainty in 2D predictions. The quantitative evaluation, on synthetic and real-world datasets, shows that our strategy leads to substantial improvements in accuracy over state of the art egocentric approaches. To tackle the lack of labelled data we also introduced a large photo-realistic synthetic dataset. xR-EgoPose offers high quality renderings of people with diverse skintones, body shapes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
