Fusing Local Similarities for Retrieval-based 3D Orientation Estimation of Unseen Objects
Chen Zhao, Yinlin Hu, Mathieu Salzmann

TL;DR
This paper presents a retrieval-based approach for estimating the 3D orientation of unseen objects from monocular images, using local similarity fusion and fast retrieval to improve generalization and efficiency.
Contribution
The authors introduce an adaptive fusion module for aggregating local similarities and a fast retrieval strategy, enabling better generalization to unseen objects in 3D orientation estimation.
Findings
Outperforms previous methods on LineMOD, LineMOD-Occluded, and T-LESS datasets.
Significantly improves generalization to unseen objects.
Provides a fast retrieval process for practical applications.
Abstract
In this paper, we tackle the task of estimating the 3D orientation of previously-unseen objects from monocular images. This task contrasts with the one considered by most existing deep learning methods which typically assume that the testing objects have been observed during training. To handle the unseen objects, we follow a retrieval-based strategy and prevent the network from learning object-specific features by computing multi-scale local similarities between the query image and synthetically-generated reference images. We then introduce an adaptive fusion module that robustly aggregates the local similarities into a global similarity score of pairwise images. Furthermore, we speed up the retrieval process by developing a fast retrieval strategy. Our experiments on the LineMOD, LineMOD-Occluded, and T-LESS datasets show that our method yields a significantly better generalization to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
