Fusing Local Similarities for Retrieval-based 3D Orientation Estimation   of Unseen Objects

Chen Zhao; Yinlin Hu; Mathieu Salzmann

arXiv:2203.08472·cs.CV·July 25, 2022

Fusing Local Similarities for Retrieval-based 3D Orientation Estimation of Unseen Objects

Chen Zhao, Yinlin Hu, Mathieu Salzmann

PDF

TL;DR

This paper presents a retrieval-based approach for estimating the 3D orientation of unseen objects from monocular images, using local similarity fusion and fast retrieval to improve generalization and efficiency.

Contribution

The authors introduce an adaptive fusion module for aggregating local similarities and a fast retrieval strategy, enabling better generalization to unseen objects in 3D orientation estimation.

Findings

01

Outperforms previous methods on LineMOD, LineMOD-Occluded, and T-LESS datasets.

02

Significantly improves generalization to unseen objects.

03

Provides a fast retrieval process for practical applications.

Abstract

In this paper, we tackle the task of estimating the 3D orientation of previously-unseen objects from monocular images. This task contrasts with the one considered by most existing deep learning methods which typically assume that the testing objects have been observed during training. To handle the unseen objects, we follow a retrieval-based strategy and prevent the network from learning object-specific features by computing multi-scale local similarities between the query image and synthetically-generated reference images. We then introduce an adaptive fusion module that robustly aggregates the local similarities into a global similarity score of pairwise images. Furthermore, we speed up the retrieval process by developing a fast retrieval strategy. Our experiments on the LineMOD, LineMOD-Occluded, and T-LESS datasets show that our method yields a significantly better generalization to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings