SliceOcc: Indoor 3D Semantic Occupancy Prediction with Vertical Slice Representation
Jianing Li, Ming Lu, Hao Wang, Chenyang Gu, Wenzhao Zheng, Li Du,, Shanghang Zhang

TL;DR
This paper introduces SliceOcc, a novel vertical slice representation and model for indoor 3D semantic occupancy prediction from RGB images, outperforming existing planar-based methods especially in occluded dense environments.
Contribution
The paper proposes a new vertical slice scene representation and a tailored RGB-based model, SliceOcc, for improved indoor 3D semantic occupancy prediction.
Findings
Achieves a 15.45% mIoU on EmbodiedScan dataset
Sets a new state-of-the-art among RGB camera-based models
Effective in dense indoor environments with occlusions
Abstract
3D semantic occupancy prediction is a crucial task in visual perception, as it requires the simultaneous comprehension of both scene geometry and semantics. It plays a crucial role in understanding 3D scenes and has great potential for various applications, such as robotic vision perception and autonomous driving. Many existing works utilize planar-based representations such as Bird's Eye View (BEV) and Tri-Perspective View (TPV). These representations aim to simplify the complexity of 3D scenes while preserving essential object information, thereby facilitating efficient scene representation. However, in dense indoor environments with prevalent occlusions, directly applying these planar-based methods often leads to difficulties in capturing global semantic occupancy, ultimately degrading model performance. In this paper, we present a new vertical slice representation that divides the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Video Analysis and Summarization
