RUST: Latent Neural Scene Representations from Unposed Imagery

Mehdi S. M. Sajjadi; Aravindh Mahendran; Thomas Kipf; Etienne Pot,; Daniel Duckworth; Mario Lucic; Klaus Greff

arXiv:2211.14306·cs.CV·March 27, 2023

RUST: Latent Neural Scene Representations from Unposed Imagery

Mehdi S. M. Sajjadi, Aravindh Mahendran, Thomas Kipf, Etienne Pot,, Daniel Duckworth, Mario Lucic, Klaus Greff

PDF

Open Access

TL;DR

RUST introduces a pose-free neural scene representation method trained solely on RGB images, enabling effective novel view synthesis and explicit pose estimation without requiring ground truth camera poses.

Contribution

It proposes RUST, a novel approach that learns latent scene and pose representations from unposed images, reducing reliance on accurate camera pose data for neural scene modeling.

Findings

01

RUST achieves comparable quality to pose-dependent methods in view synthesis.

02

The learned latent pose structure allows meaningful camera transformations.

03

RUST enables large-scale training of neural scene representations without pose supervision.

Abstract

Inferring the structure of 3D scenes from 2D observations is a fundamental challenge in computer vision. Recently popularized approaches based on neural scene representations have achieved tremendous impact and have been applied across a variety of applications. One of the major remaining challenges in this space is training a single model which can provide latent representations which effectively generalize beyond a single scene. Scene Representation Transformer (SRT) has shown promise in this direction, but scaling it to a larger set of diverse scenes is challenging and necessitates accurately posed ground truth data. To address this problem, we propose RUST (Really Unposed Scene representation Transformer), a pose-free approach to novel view synthesis trained on RGB images alone. Our main insight is that one can train a Pose Encoder that peeks at the target image and learns a latent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Advanced Neural Network Applications

MethodsMulti-Head Attention · Attention Is All You Need · Layer Normalization · Adam · Softmax · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Label Smoothing · Absolute Position Encodings