Delving Deep Into Hybrid Annotations for 3D Human Recovery in the Wild
Yu Rong, Ziwei Liu, Cheng Li, Kaidi Cao, Chen Change Loy

TL;DR
This paper investigates the effectiveness and cost-efficiency of various annotations, especially dense correspondence, for 3D human recovery from single in-the-wild images, proposing a practical approach when 3D annotations are scarce.
Contribution
It provides a comprehensive analysis of annotation types, highlighting dense correspondence as a cost-effective alternative to 3D annotations for in-the-wild 3D human recovery.
Findings
Dense correspondence achieves 92% of the performance of full 3D annotations.
Traditional 2D annotations are less effective in guiding 3D recovery.
DensePose is an effective annotation type for this task.
Abstract
Though much progress has been achieved in single-image 3D human recovery, estimating 3D model for in-the-wild images remains a formidable challenge. The reason lies in the fact that obtaining high-quality 3D annotations for in-the-wild images is an extremely hard task that consumes enormous amount of resources and manpower. To tackle this problem, previous methods adopt a hybrid training strategy that exploits multiple heterogeneous types of annotations including 3D and 2D while leaving the efficacy of each annotation not thoroughly investigated. In this work, we aim to perform a comprehensive study on cost and effectiveness trade-off between different annotations. Specifically, we focus on the challenging task of in-the-wild 3D human recovery from single images when paired 3D annotations are not fully available. Through extensive experiments, we obtain several observations: 1) 3D…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Vision and Imaging · Human Pose and Action Recognition
