TL;DR
This paper introduces a decoupled deep learning method for 3D human pose estimation from depth images, combining 2D pose detection with residual refinement to improve accuracy and speed in multi-person HRI scenarios.
Contribution
It proposes a novel decoupled approach that separates 2D pose estimation from 3D refinement, using residual regression to enhance 3D pose accuracy from depth data.
Findings
Achieves competitive accuracy on public datasets.
Offers real-time performance suitable for multi-person HRI.
Outperforms recent state-of-the-art methods in speed and accuracy.
Abstract
We propose to leverage recent advances in reliable 2D pose estimation with Convolutional Neural Networks (CNN) to estimate the 3D pose of people from depth images in multi-person Human-Robot Interaction (HRI) scenarios. Our method is based on the observation that using the depth information to obtain 3D lifted points from 2D body landmark detections provides a rough estimate of the true 3D human pose, thus requiring only a refinement step. In that line our contributions are threefold. (i) we propose to perform 3D pose estimation from depth images by decoupling 2D pose estimation and 3D pose refinement; (ii) we propose a deep-learning approach that regresses the residual pose between the lifted 3D pose and the true 3D pose; (iii) we show that despite its simplicity, our approach achieves very competitive results both in accuracy and speed on two public datasets and is therefore appealing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
