TL;DR
FreeOcc is a training-free, open-vocabulary occupancy prediction framework that constructs 3D occupancy maps from monocular or RGB-D sequences without requiring annotations or pose ground truth, outperforming prior methods.
Contribution
It introduces a novel training-free pipeline for open-vocabulary occupancy prediction that operates without 3D annotations or pose supervision, enabling zero-shot transfer to new environments.
Findings
FreeOcc achieves over 2x improvements in IoU and mIoU on EmbodiedOcc-ScanNet.
FreeOcc outperforms prior self-supervised methods despite being training-free.
FreeOcc transfers zero-shot to novel environments, outperforming supervised baselines.
Abstract
Existing learning-based occupancy prediction methods rely on large-scale 3D annotations and generalize poorly across environments. We present FreeOcc, a training-free framework for open-vocabulary occupancy prediction from monocular or RGB-D sequences. Unlike prior approaches that require voxel-level supervision and ground-truth camera poses, FreeOcc operates without 3D annotations, pose ground truth, or any learning stage. FreeOcc incrementally builds a globally consistent occupancy map via a four-layer pipeline: a SLAM backbone estimates poses and sparse geometry; a geometrically consistent Gaussian update constructs dense 3D Gaussian maps; open-vocabulary semantics from off-the-shelf vision-language models are associated with Gaussian primitives; and a probabilistic Gaussian-to-occupancy projection produces dense voxel occupancy. Despite being entirely training-free and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
