SilhoNet: An RGB Method for 6D Object Pose Estimation

Gideon Billings; Matthew Johnson-Roberson

arXiv:1809.06893·cs.CV·May 8, 2020

SilhoNet: An RGB Method for 6D Object Pose Estimation

Gideon Billings, Matthew Johnson-Roberson

PDF

2 Repos

TL;DR

SilhoNet is a novel CNN-based method that estimates 6D object pose from monocular images by predicting silhouettes and translation, outperforming existing monocular approaches on the YCB-Video dataset.

Contribution

Introduces SilhoNet, a new monocular 6D pose estimation method using silhouette prediction and CNNs, eliminating the need for RGB-D sensors.

Findings

01

Achieves superior performance on YCB-Video dataset

02

Outperforms two state-of-the-art monocular pose estimation networks

03

Effectively predicts 6D pose using only monocular RGB images

Abstract

Autonomous robot manipulation involves estimating the translation and orientation of the object to be manipulated as a 6-degree-of-freedom (6D) pose. Methods using RGB-D data have shown great success in solving this problem. However, there are situations where cost constraints or the working environment may limit the use of RGB-D sensors. When limited to monocular camera data only, the problem of object pose estimation is very challenging. In this work, we introduce a novel method called SilhoNet that predicts 6D object pose from monocular images. We use a Convolutional Neural Network (CNN) pipeline that takes in Region of Interest (ROI) proposals to simultaneously predict an intermediate silhouette representation for objects with an associated occlusion mask and a 3D translation vector. The 3D orientation is then regressed from the predicted silhouettes. We show that our method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.