SGIFormer: Semantic-guided and Geometric-enhanced Interleaving Transformer for 3D Instance Segmentation
Lei Yao, Yi Wang, Moyun Liu, Lap-Pui Chau

TL;DR
SGIFormer introduces a novel transformer-based approach for 3D instance segmentation, leveraging semantic-guided query initialization and geometric-enhanced decoding to improve accuracy and efficiency in large-scale 3D scenes.
Contribution
The paper proposes SGIFormer, combining semantic-guided query initialization and a geometric-enhanced interleaving transformer decoder for improved 3D instance segmentation.
Findings
Achieves state-of-the-art results on ScanNet V2 and ScanNet200 datasets.
Balances accuracy and efficiency effectively.
Demonstrates robustness on challenging high-fidelity benchmarks.
Abstract
In recent years, transformer-based models have exhibited considerable potential in point cloud instance segmentation. Despite the promising performance achieved by existing methods, they encounter challenges such as instance query initialization problems and excessive reliance on stacked layers, rendering them incompatible with large-scale 3D scenes. This paper introduces a novel method, named SGIFormer, for 3D instance segmentation, which is composed of the Semantic-guided Mix Query (SMQ) initialization and the Geometric-enhanced Interleaving Transformer (GIT) decoder. Specifically, the principle of our SMQ initialization scheme is to leverage the predicted voxel-wise semantic information to implicitly generate the scene-aware query, yielding adequate scene prior and compensating for the learnable query set. Subsequently, we feed the formed overall query into our GIT decoder to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Image Processing and 3D Reconstruction · Advanced Neural Network Applications
MethodsAttention Is All You Need · Residual Connection · Byte Pair Encoding · Layer Normalization · Label Smoothing · Linear Layer · Adam · Dropout · Multi-Head Attention · Dense Connections
