Continuous 3D Perception Model with Persistent State

Qianqian Wang; Yifei Zhang; Aleksander Holynski; Alexei A. Efros,; Angjoo Kanazawa

arXiv:2501.12387·cs.CV·January 22, 2025

Continuous 3D Perception Model with Persistent State

Qianqian Wang, Yifei Zhang, Aleksander Holynski, Alexei A. Efros,, Angjoo Kanazawa

PDF

Open Access

TL;DR

The paper introduces CUT3R, a stateful recurrent model that continuously updates 3D scene representations from image streams, enabling online dense reconstruction and scene inference with high flexibility and accuracy.

Contribution

It presents a novel persistent state model for 3D perception that handles varying input types and predicts unseen scene regions, advancing real-time 3D reconstruction.

Findings

01

Achieves state-of-the-art results on multiple 3D/4D tasks.

02

Effectively infers unobserved scene regions.

03

Handles both static and dynamic scenes with flexible input streams.

Abstract

We present a unified framework capable of solving a broad range of 3D tasks. Our approach features a stateful recurrent model that continuously updates its state representation with each new observation. Given a stream of images, this evolving state can be used to generate metric-scale pointmaps (per-pixel 3D points) for each new input in an online fashion. These pointmaps reside within a common coordinate system, and can be accumulated into a coherent, dense scene reconstruction that updates as new images arrive. Our model, called CUT3R (Continuous Updating Transformer for 3D Reconstruction), captures rich priors of real-world scenes: not only can it predict accurate pointmaps from image observations, but it can also infer unseen regions of the scene by probing at virtual, unobserved views. Our method is simple yet highly flexible, naturally accepting varying lengths of images that may…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Image Retrieval and Classification Techniques

MethodsAttention Is All You Need · Adam · Softmax · Absolute Position Encodings · Residual Connection · Dropout · Byte Pair Encoding · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer