BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth
Mahdi Rad, Vincent Lepetit

TL;DR
This paper presents BB8, a scalable and robust method for 3D object detection and pose estimation from color images, effectively handling partial occlusions and symmetric objects without using depth data.
Contribution
The authors introduce a holistic CNN-based approach with pose range classification and optional refinement, achieving state-of-the-art results on multiple challenging datasets.
Findings
Improved LINEMOD accuracy from 73.7% to 89.3%.
First to report results on T-LESS with color images only.
Scalable approach allowing training for multiple objects simultaneously.
Abstract
We introduce a novel method for 3D object detection and pose estimation from color images only. We first use segmentation to detect the objects of interest in 2D even in presence of partial occlusions and cluttered background. By contrast with recent patch-based methods, we rely on a "holistic" approach: We apply to the detected objects a Convolutional Neural Network (CNN) trained to predict their 3D poses in the form of 2D projections of the corners of their 3D bounding boxes. This, however, is not sufficient for handling objects from the recent T-LESS dataset: These objects exhibit an axis of rotational symmetry, and the similarity of two images of such an object under two different poses makes training the CNN challenging. We solve this problem by restricting the range of poses used for training, and by introducing a classifier to identify the range of a pose at run-time before…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Robot Manipulation and Learning · Human Pose and Action Recognition
