TL;DR
This paper introduces SPIN, a collaborative approach combining deep learning and iterative optimization for 3D human pose and shape estimation, improving accuracy and efficiency especially when 3D ground truth is limited.
Contribution
The paper proposes a novel self-improving method that integrates model-fitting with deep network training, enhancing 3D human pose estimation accuracy and speed.
Findings
Outperforms state-of-the-art methods in various settings
Effective with limited or no 3D ground truth data
Self-improving loop enhances both network and optimization results
Abstract
Model-based human pose estimation is currently approached through two different paradigms. Optimization-based methods fit a parametric body model to 2D observations in an iterative manner, leading to accurate image-model alignments, but are often slow and sensitive to the initialization. In contrast, regression-based methods, that use a deep network to directly estimate the model parameters from pixels, tend to provide reasonable, but not pixel accurate, results while requiring huge amounts of supervision. In this work, instead of investigating which approach is better, our key insight is that the two paradigms can form a strong collaboration. A reasonable, directly regressed estimate from the network can initialize the iterative optimization making the fitting faster and more accurate. Similarly, a pixel accurate fit from iterative optimization can act as strong supervision for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
