GEOcc: Geometrically Enhanced 3D Occupancy Network with   Implicit-Explicit Depth Fusion and Contextual Self-Supervision

Xin Tan; Wenbin Wu; Zhiwei Zhang; Chaojie Fan; Yong Peng; Zhizhong; Zhang; Yuan Xie; Lizhuang Ma

arXiv:2405.10591·cs.CV·March 25, 2025·2 cites

GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision

Xin Tan, Wenbin Wu, Zhiwei Zhang, Chaojie Fan, Yong Peng, Zhizhong, Zhang, Yuan Xie, Lizhuang Ma

PDF

Open Access

TL;DR

GEOcc introduces a novel 3D occupancy perception model that combines explicit and implicit depth modeling, along with self-supervision, to improve accuracy and generalizability in vision-only autonomous driving systems.

Contribution

The paper proposes a new geometric-enhanced occupancy network with depth fusion and self-supervised learning, achieving state-of-the-art results with minimal image resolution and lightweight backbone.

Findings

01

Achieves 3.3% improvement on Occ3D-nuScenes dataset.

02

Outperforms baseline models in accuracy and robustness.

03

Uses less computational resources with a lightweight backbone.

Abstract

3D occupancy perception holds a pivotal role in recent vision-centric autonomous driving systems by converting surround-view images into integrated geometric and semantic representations within dense 3D grids. Nevertheless, current models still encounter two main challenges: modeling depth accurately in the 2D-3D view transformation stage, and overcoming the lack of generalizability issues due to sparse LiDAR supervision. To address these issues, this paper presents GEOcc, a Geometric-Enhanced Occupancy network tailored for vision-only surround-view perception. Our approach is three-fold: 1) Integration of explicit lift-based depth prediction and implicit projection-based transformers for depth modeling, enhancing the density and robustness of view transformation. 2) Utilization of mask-based encoder-decoder architecture for fine-grained semantic predictions; 3) Adoption of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Computer Graphics and Visualization Techniques · Advanced Vision and Imaging