LOMA: Language-assisted Semantic Occupancy Network via Triplane Mamba
Yubo Cui, Zhiheng Li, Jiaqiang Wang, Zheng Fang

TL;DR
LOMA introduces a vision-language framework with a scene generator and a tri-plane fusion block to improve 3D semantic occupancy prediction, addressing geometric information limitations and interaction restrictions in outdoor scenes.
Contribution
The paper proposes a novel language-assisted 3D occupancy prediction network with a scene generator and an efficient fusion module, enhancing geometric understanding and semantic fusion.
Findings
Achieves state-of-the-art results on SemanticKITTI and SSCBench-KITTI360 datasets.
Effectively fuses vision and language features with reduced computational cost.
Improves 3D semantic and geometric completion performance.
Abstract
Vision-based 3D occupancy prediction has become a popular research task due to its versatility and affordability. Nowadays, conventional methods usually project the image-based vision features to 3D space and learn the geometric information through the attention mechanism, enabling the 3D semantic occupancy prediction. However, these works usually face two main challenges: 1) Limited geometric information. Due to the lack of geometric information in the image itself, it is challenging to directly predict 3D space information, especially in large-scale outdoor scenes. 2) Local restricted interaction. Due to the quadratic complexity of the attention mechanism, they often use modified local attention to fuse features, resulting in a restricted fusion. To address these problems, in this paper, we propose a language-assisted 3D semantic occupancy prediction network, named LOMA. In the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsSoftmax · Attention Is All You Need · Mamba: Linear-Time Sequence Modeling with Selective State Spaces
