Deep Exemplar 2D-3D Detection by Adapting from Real to Rendered Views

Francisco Massa; Bryan Russell; Mathieu Aubry

arXiv:1512.02497·cs.CV·April 19, 2016

Deep Exemplar 2D-3D Detection by Adapting from Real to Rendered Views

Francisco Massa, Bryan Russell, Mathieu Aubry

PDF

TL;DR

This paper introduces a CNN-based method for 2D-3D exemplar detection that adapts features from natural images to rendered views, improving accuracy and speed in object detection tasks.

Contribution

The paper presents a novel CNN approach that learns to adapt features from natural images to align with CAD rendered views for improved 2D-3D detection.

Findings

01

Achieved higher detection accuracy on IKEA dataset.

02

Outperformed previous methods on Pascal VOC chair detection.

03

Demonstrated effective feature adaptation between real and rendered views.

Abstract

This paper presents an end-to-end convolutional neural network (CNN) for 2D-3D exemplar detection. We demonstrate that the ability to adapt the features of natural images to better align with those of CAD rendered views is critical to the success of our technique. We show that the adaptation can be learned by compositing rendered views of textured object models on natural images. Our approach can be naturally incorporated into a CNN detection pipeline and extends the accuracy and speed benefits from recent advances in deep learning to 2D-3D exemplar detection. We applied our method to two tasks: instance detection, where we evaluated on the IKEA dataset, and object category detection, where we out-perform Aubry et al. for "chair" detection on a subset of the Pascal VOC dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings