Towards Robust and Smooth 3D Multi-Person Pose Estimation from Monocular Videos in the Wild
Sungchan Park, Eunyi You, Inhoe Lee, Joonseok Lee

TL;DR
This paper introduces POTR-3D, a novel sequence-to-sequence model with geometry-aware data augmentation for robust, smooth 3D multi-person pose estimation from monocular videos, excelling in unseen views and occlusion scenarios.
Contribution
We propose POTR-3D, the first sequence-to-sequence 2D-to-3D lifting model for multi-person pose estimation, enhanced by a geometry-aware data augmentation strategy for improved robustness and smoothness.
Findings
Achieves state-of-the-art results on public benchmarks.
Robustly generalizes to unseen views and occlusions.
Produces more natural and smoother pose outputs.
Abstract
3D pose estimation is an invaluable task in computer vision with various practical applications. Especially, 3D pose estimation for multi-person from a monocular video (3DMPPE) is particularly challenging and is still largely uncharted, far from applying to in-the-wild scenarios yet. We pose three unresolved issues with the existing methods: lack of robustness on unseen views during training, vulnerability to occlusion, and severe jittering in the output. As a remedy, we propose POTR-3D, the first realization of a sequence-to-sequence 2D-to-3D lifting model for 3DMPPE, powered by a novel geometry-aware data augmentation strategy, capable of generating unbounded data with a variety of views while caring about the ground plane and occlusions. Through extensive experiments, we verify that the proposed model and data augmentation robustly generalizes to diverse unseen views, robustly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Towards Robust and Smooth 3D Multi-Person Pose Estimation from Monocular Videos in the Wild· youtube
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Vision and Imaging · Video Surveillance and Tracking Methods
