Semantic Pose using Deep Networks Trained on Synthetic RGB-D
Jeremie Papon, Markus Schoeler

TL;DR
This paper presents a deep neural network approach for indoor scene understanding from RGB-D images, capable of recognizing furniture, estimating their pose and location efficiently even in cluttered, noisy environments.
Contribution
The authors introduce a multi-output CNN trained with synthetic data for accurate, real-time furniture instance detection and pose estimation in indoor scenes.
Findings
Successfully annotates challenging real scenes
Operates in real-time on GPU
Performs well with limited and noisy data
Abstract
In this work we address the problem of indoor scene understanding from RGB-D images. Specifically, we propose to find instances of common furniture classes, their spatial extent, and their pose with respect to generalized class models. To accomplish this, we use a deep, wide, multi-output convolutional neural network (CNN) that predicts class, pose, and location of possible objects simultaneously. To overcome the lack of large annotated RGB-D training sets (especially those with pose), we use an on-the-fly rendering pipeline that generates realistic cluttered room scenes in parallel to training. We then perform transfer learning on the relatively small amount of publicly available annotated RGB-D data, and find that our model is able to successfully annotate even highly challenging real scenes. Importantly, our trained network is able to understand noisy and sparse observations of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Video Surveillance and Tracking Methods · Robotics and Sensor-Based Localization
