Lift, Splat, Map: Lifting Foundation Masks for Label-Free Semantic Scene   Completion

Arthur Zhang; Rainier Heijne; Joydeep Biswas

arXiv:2407.03425·cs.CV·July 8, 2024

Lift, Splat, Map: Lifting Foundation Masks for Label-Free Semantic Scene Completion

Arthur Zhang, Rainier Heijne, Joydeep Biswas

PDF

Open Access

TL;DR

This paper introduces LSMap, a real-time, label-free method that lifts foundation model masks to produce a continuous, open-set semantic and elevation-aware bird's eye view representation for urban scene understanding, including occluded areas.

Contribution

The paper presents LSMap, a novel approach that leverages foundation model masks to perform semantic scene completion without human labels, improving robustness to occlusions and enabling open-set scene understanding.

Findings

01

Outperforms existing models on semantic and elevation scene completion tasks.

02

Pre-trained representations surpass existing foundation models in unsupervised scene completion.

03

Operates in real time using only a single RGBD image.

Abstract

Autonomous mobile robots deployed in urban environments must be context-aware, i.e., able to distinguish between different semantic entities, and robust to occlusions. Current approaches like semantic scene completion (SSC) require pre-enumerating the set of classes and costly human annotations, while representation learning methods relax these assumptions but are not robust to occlusions and learn representations tailored towards auxiliary tasks. To address these limitations, we propose LSMap, a method that lifts masks from visual foundation models to predict a continuous, open-set semantic and elevation-aware representation in bird's eye view (BEV) for the entire scene, including regions underneath dynamic entities and in occluded areas. Our model only requires a single RGBD image, does not require human labels, and operates in real time. We quantitatively demonstrate our approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Handwritten Text Recognition Techniques · Multimodal Machine Learning Applications

MethodsSparse Evolutionary Training