TL;DR
This paper introduces a two-stage, generalizable method for multi-view 3D human pose estimation that separates detection from regression, enabling adaptation to new environments without extensive retraining.
Contribution
The proposed approach uniquely separates single-view pose detection from multi-view 3D regression, allowing effective generalization across different camera setups and environments.
Findings
Achieves competitive results on Human3.6M dataset.
Significantly improves performance on a multi-view clinical dataset.
Models detector characteristics to enhance robustness.
Abstract
Despite the significant improvement in the performance of monocular pose estimation approaches and their ability to generalize to unseen environments, multi-view (MV) approaches are often lagging behind in terms of accuracy and are specific to certain datasets. This is mainly due to the fact that (1) contrary to real world single-view (SV) datasets, MV datasets are often captured in controlled environments to collect precise 3D annotations, which do not cover all real world challenges, and (2) the model parameters are learned for specific camera setups. To alleviate these problems, we propose a two-stage approach to detect and estimate 3D human poses, which separates SV pose detection from MV 3D pose estimation. This separation enables us to utilize each dataset for the right task, i.e. SV datasets for constructing robust pose detection models and MV datasets for constructing precise MV…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
