TL;DR
This paper introduces SURREAL, a large-scale synthetic dataset of human images with ground truth annotations, enabling effective training of CNNs for human pose and segmentation tasks in real images.
Contribution
The creation of SURREAL, a synthetic dataset with over 6 million images and ground truth data, facilitating training of models for human analysis without extensive manual labeling.
Findings
CNNs trained on SURREAL perform well on real images
Synthetic data enables accurate human depth estimation
Segmentation results are improved using the dataset
Abstract
Estimating human pose, shape, and motion from images and videos are fundamental challenges with many applications. Recent advances in 2D human pose estimation use large amounts of manually-labeled training data for learning convolutional neural networks (CNNs). Such data is time consuming to acquire and difficult to extend. Moreover, manual labeling of 3D pose, depth and motion is impractical. In this work we present SURREAL (Synthetic hUmans foR REAL tasks): a new large-scale dataset with synthetically-generated but realistic images of people rendered from 3D sequences of human motion capture data. We generate more than 6 million frames together with ground truth pose, depth maps, and segmentation masks. We show that CNNs trained on our synthetic dataset allow for accurate human depth estimation and human part segmentation in real RGB images. Our results and the new dataset open up new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
