SGR-OCC: Evolving Monocular Priors for Embodied 3D Occupancy Prediction via Soft-Gating Lifting and Semantic-Adaptive Geometric Refinement

Yiran Guo; Simone Mentasti; Xiaofeng Jin; Matteo Frosi; Matteo Matteucci

arXiv:2603.14076·cs.CV·March 17, 2026

SGR-OCC: Evolving Monocular Priors for Embodied 3D Occupancy Prediction via Soft-Gating Lifting and Semantic-Adaptive Geometric Refinement

Yiran Guo, Simone Mentasti, Xiaofeng Jin, Matteo Frosi, Matteo Matteucci

PDF

Open Access

TL;DR

This paper introduces SGR-OCC, a novel framework for monocular 3D occupancy prediction that effectively addresses depth ambiguity and cold start issues, achieving state-of-the-art results in embodied scene understanding.

Contribution

The paper proposes a unified method with soft-gating and ray-refinement modules, along with a two-phase training strategy, to improve monocular 3D occupancy prediction in embodied AI.

Findings

01

Achieves 58.55% IoU in local prediction tasks.

02

Surpasses previous methods by over 3.6% in key metrics.

03

Demonstrates superior structural and boundary preservation in complex scenes.

Abstract

3D semantic occupancy prediction is a cornerstone for embodied AI, enabling agents to perceive dense scene geometry and semantics incrementally from monocular video streams. However, current online frameworks face two critical bottlenecks: the inherent depth ambiguity of monocular estimation that causes "feature bleeding" at object boundaries , and the "cold start" instability where uninitialized temporal fusion layers distort high-quality spatial priors during early training stages. In this paper, we propose SGR-OCC (Soft-Gating and Ray-refinement Occupancy), a unified framework driven by the philosophy of "Inheritance and Evolution". To perfectly inherit monocular spatial expertise, we introduce a Soft-Gating Feature Lifter that explicitly models depth uncertainty via a Gaussian gate to probabilistically suppress background noise. Furthermore, a Dynamic Ray-Constrained Anchor…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Advanced Vision and Imaging · Robotics and Sensor-Based Localization