Lift, Splat, Map: Lifting Foundation Masks for Label-Free Semantic Scene Completion
Arthur Zhang, Rainier Heijne, Joydeep Biswas

TL;DR
This paper introduces LSMap, a real-time, label-free method that lifts foundation model masks to produce a continuous, open-set semantic and elevation-aware bird's eye view representation for urban scene understanding, including occluded areas.
Contribution
The paper presents LSMap, a novel approach that leverages foundation model masks to perform semantic scene completion without human labels, improving robustness to occlusions and enabling open-set scene understanding.
Findings
Outperforms existing models on semantic and elevation scene completion tasks.
Pre-trained representations surpass existing foundation models in unsupervised scene completion.
Operates in real time using only a single RGBD image.
Abstract
Autonomous mobile robots deployed in urban environments must be context-aware, i.e., able to distinguish between different semantic entities, and robust to occlusions. Current approaches like semantic scene completion (SSC) require pre-enumerating the set of classes and costly human annotations, while representation learning methods relax these assumptions but are not robust to occlusions and learn representations tailored towards auxiliary tasks. To address these limitations, we propose LSMap, a method that lifts masks from visual foundation models to predict a continuous, open-set semantic and elevation-aware representation in bird's eye view (BEV) for the entire scene, including regions underneath dynamic entities and in occluded areas. Our model only requires a single RGBD image, does not require human labels, and operates in real time. We quantitatively demonstrate our approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Handwritten Text Recognition Techniques · Multimodal Machine Learning Applications
MethodsSparse Evolutionary Training
