3D Object Recognition By Corresponding and Quantizing Neural 3D Scene   Representations

Mihir Prabhudesai; Shamit Lal; Hsiao-Yu Fish Tung; Adam W. Harley,; Shubhankar Potdar; Katerina Fragkiadaki

arXiv:2010.16279·cs.CV·November 2, 2020·1 cites

3D Object Recognition By Corresponding and Quantizing Neural 3D Scene Representations

Mihir Prabhudesai, Shamit Lal, Hsiao-Yu Fish Tung, Adam W. Harley,, Shubhankar Potdar, Katerina Fragkiadaki

PDF

Open Access

TL;DR

This paper introduces a self-supervised 3D object recognition system that learns view-invariant 3D features from RGB-D images, enabling accurate object detection and pose estimation without relying on labeled 3D data.

Contribution

The proposed model maps RGB-D images to 3D feature maps and clusters them into prototypes, allowing pose and scale estimation without strong supervision.

Findings

01

Outperforms baselines in object retrieval and pose estimation

02

Features are invariant to viewpoint and scale changes

03

Enables unsupervised learning of 3D object representations

Abstract

We propose a system that learns to detect objects and infer their 3D poses in RGB-D images. Many existing systems can identify objects and infer 3D poses, but they heavily rely on human labels and 3D annotations. The challenge here is to achieve this without relying on strong supervision signals. To address this challenge, we propose a model that maps RGB-D images to a set of 3D visual feature maps in a differentiable fully-convolutional manner, supervised by predicting views. The 3D feature maps correspond to a featurization of the 3D world scene depicted in the images. The object 3D feature representations are invariant to camera viewpoint changes or zooms, which means feature matching can identify similar objects under different camera viewpoints. We can compare the 3D feature maps of two objects by searching alignment across scales and 3D rotations, and, as a result of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques