SGR-OCC: Evolving Monocular Priors for Embodied 3D Occupancy Prediction via Soft-Gating Lifting and Semantic-Adaptive Geometric Refinement
Yiran Guo, Simone Mentasti, Xiaofeng Jin, Matteo Frosi, Matteo Matteucci

TL;DR
This paper introduces SGR-OCC, a novel framework for monocular 3D occupancy prediction that effectively addresses depth ambiguity and cold start issues, achieving state-of-the-art results in embodied scene understanding.
Contribution
The paper proposes a unified method with soft-gating and ray-refinement modules, along with a two-phase training strategy, to improve monocular 3D occupancy prediction in embodied AI.
Findings
Achieves 58.55% IoU in local prediction tasks.
Surpasses previous methods by over 3.6% in key metrics.
Demonstrates superior structural and boundary preservation in complex scenes.
Abstract
3D semantic occupancy prediction is a cornerstone for embodied AI, enabling agents to perceive dense scene geometry and semantics incrementally from monocular video streams. However, current online frameworks face two critical bottlenecks: the inherent depth ambiguity of monocular estimation that causes "feature bleeding" at object boundaries , and the "cold start" instability where uninitialized temporal fusion layers distort high-quality spatial priors during early training stages. In this paper, we propose SGR-OCC (Soft-Gating and Ray-refinement Occupancy), a unified framework driven by the philosophy of "Inheritance and Evolution". To perfectly inherit monocular spatial expertise, we introduce a Soft-Gating Feature Lifter that explicitly models depth uncertainty via a Gaussian gate to probabilistically suppress background noise. Furthermore, a Dynamic Ray-Constrained Anchor…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Advanced Vision and Imaging · Robotics and Sensor-Based Localization
