TL;DR
LatentFusion introduces an end-to-end differentiable framework that reconstructs and renders latent 3D object representations from few views, enabling accurate 6D pose estimation of unseen objects without requiring object-specific training.
Contribution
The paper presents a novel neural network that reconstructs and renders latent 3D object representations, allowing pose estimation of unseen objects without prior object-specific training.
Findings
Performs competitively on unseen object pose estimation benchmarks.
Generalizes well to new objects with limited reference views.
Provides a new dataset, MOPED, for unseen object pose estimation.
Abstract
Current 6D object pose estimation methods usually require a 3D model for each object. These methods also require additional training in order to incorporate new objects. As a result, they are difficult to scale to a large number of objects and cannot be directly applied to unseen objects. We propose a novel framework for 6D pose estimation of unseen objects. We present a network that reconstructs a latent 3D representation of an object using a small number of reference views at inference time. Our network is able to render the latent 3D representation from arbitrary views. Using this neural renderer, we directly optimize for pose given an input image. By training our network with a large number of 3D shapes for reconstruction and rendering, our network generalizes well to unseen objects. We present a new dataset for unseen object pose estimation--MOPED. We evaluate the performance of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
