Memorize What Matters: Emergent Scene Decomposition from Multitraverse
Yiming Li, Zehong Wang, Yue Wang, Zhiding Yu, Zan Gojcic, Marco, Pavone, Chen Feng, Jose M. Alvarez

TL;DR
This paper introduces 3D Gaussian Mapping (3DGM), a self-supervised framework that decomposes environments into persistent and ephemeral elements from multitraverse RGB videos, enhancing robotic perception and mapping.
Contribution
The paper presents a novel self-supervised, camera-only 3D mapping method that jointly performs environment-object decomposition and ephemeral object segmentation from multitraverse videos.
Findings
Effective environment-object decomposition demonstrated
Achieved accurate 2D segmentation and 3D reconstruction
Validated on the new Mapverse benchmark
Abstract
Humans naturally retain memories of permanent elements, while ephemeral moments often slip through the cracks of memory. This selective retention is crucial for robotic perception, localization, and mapping. To endow robots with this capability, we introduce 3D Gaussian Mapping (3DGM), a self-supervised, camera-only offline mapping framework grounded in 3D Gaussian Splatting. 3DGM converts multitraverse RGB videos from the same region into a Gaussian-based environmental map while concurrently performing 2D ephemeral object segmentation. Our key observation is that the environment remains consistent across traversals, while objects frequently change. This allows us to exploit self-supervision from repeated traversals to achieve environment-object decomposition. More specifically, 3DGM formulates multitraverse environmental mapping as a robust differentiable rendering problem, treating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsVideo Analysis and Summarization · Topic Modeling
