LOMA: Language-assisted Semantic Occupancy Network via Triplane Mamba

Yubo Cui; Zhiheng Li; Jiaqiang Wang; Zheng Fang

arXiv:2412.08388·cs.CV·December 12, 2024

LOMA: Language-assisted Semantic Occupancy Network via Triplane Mamba

Yubo Cui, Zhiheng Li, Jiaqiang Wang, Zheng Fang

PDF

Open Access 1 Video

TL;DR

LOMA introduces a vision-language framework with a scene generator and a tri-plane fusion block to improve 3D semantic occupancy prediction, addressing geometric information limitations and interaction restrictions in outdoor scenes.

Contribution

The paper proposes a novel language-assisted 3D occupancy prediction network with a scene generator and an efficient fusion module, enhancing geometric understanding and semantic fusion.

Findings

01

Achieves state-of-the-art results on SemanticKITTI and SSCBench-KITTI360 datasets.

02

Effectively fuses vision and language features with reduced computational cost.

03

Improves 3D semantic and geometric completion performance.

Abstract

Vision-based 3D occupancy prediction has become a popular research task due to its versatility and affordability. Nowadays, conventional methods usually project the image-based vision features to 3D space and learn the geometric information through the attention mechanism, enabling the 3D semantic occupancy prediction. However, these works usually face two main challenges: 1) Limited geometric information. Due to the lack of geometric information in the image itself, it is challenging to directly predict 3D space information, especially in large-scale outdoor scenes. 2) Local restricted interaction. Due to the quadratic complexity of the attention mechanism, they often use modified local attention to fuse features, resulting in a restricted fusion. To address these problems, in this paper, we propose a language-assisted 3D semantic occupancy prediction network, named LOMA. In the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

LOMA: Language-assisted Semantic Occupancy Network via Triplane Mamba· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsSoftmax · Attention Is All You Need · Mamba: Linear-Time Sequence Modeling with Selective State Spaces