3D Object Recognition By Corresponding and Quantizing Neural 3D Scene Representations
Mihir Prabhudesai, Shamit Lal, Hsiao-Yu Fish Tung, Adam W. Harley,, Shubhankar Potdar, Katerina Fragkiadaki

TL;DR
This paper introduces a self-supervised 3D object recognition system that learns view-invariant 3D features from RGB-D images, enabling accurate object detection and pose estimation without relying on labeled 3D data.
Contribution
The proposed model maps RGB-D images to 3D feature maps and clusters them into prototypes, allowing pose and scale estimation without strong supervision.
Findings
Outperforms baselines in object retrieval and pose estimation
Features are invariant to viewpoint and scale changes
Enables unsupervised learning of 3D object representations
Abstract
We propose a system that learns to detect objects and infer their 3D poses in RGB-D images. Many existing systems can identify objects and infer 3D poses, but they heavily rely on human labels and 3D annotations. The challenge here is to achieve this without relying on strong supervision signals. To address this challenge, we propose a model that maps RGB-D images to a set of 3D visual feature maps in a differentiable fully-convolutional manner, supervised by predicting views. The 3D feature maps correspond to a featurization of the 3D world scene depicted in the images. The object 3D feature representations are invariant to camera viewpoint changes or zooms, which means feature matching can identify similar objects under different camera viewpoints. We can compare the 3D feature maps of two objects by searching alignment across scales and 3D rotations, and, as a result of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques
