O3N: Omnidirectional Open-Vocabulary Occupancy Prediction
Mengfei Duan, Hao Shi, Fei Teng, Guoqiang Zhao, Yuheng Zhang, Zhiyong Li, Kailun Yang

TL;DR
O3N introduces a novel omnidirectional, open-vocabulary 3D occupancy prediction framework that integrates geometric and semantic understanding for comprehensive scene perception in autonomous agents.
Contribution
It presents the first purely visual, end-to-end omnidirectional occupancy prediction model with a polar-spiral voxel topology and a unified semantic-geometry supervision mechanism.
Findings
Achieves state-of-the-art results on QuadOcc and Human360Occ benchmarks.
Demonstrates strong cross-scene generalization.
Shows effective semantic scalability in 3D scene modeling.
Abstract
Understanding and reconstructing the 3D world through omnidirectional perception is an inevitable trend in the development of autonomous agents and embodied intelligence. However, existing 3D occupancy prediction methods are constrained by limited perspective inputs and predefined training distribution, making them difficult to apply to embodied agents that require comprehensive and safe perception of scenes in open world exploration. To address this, we present O3N, the first purely visual, end-to-end Omnidirectional Open-vocabulary Occupancy predictioN framework. O3N embeds omnidirectional voxels in a polar-spiral topology via the Polar-spiral Mamba (PsM) module, enabling continuous spatial representation and long-range context modeling across 360{\deg}. The Occupancy Cost Aggregation (OCA) module introduces a principled mechanism for unifying geometric and semantic supervision within…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Robotics and Sensor-Based Localization · Advanced Vision and Imaging
