Beyond the Final Layer: Hierarchical Query Fusion Transformer with Agent-Interpolation Initialization for 3D Instance Segmentation
Jiahao Lu, Jiacheng Deng, Tianzhu Zhang

TL;DR
This paper introduces BFL, a novel transformer-based approach for 3D instance segmentation that enhances query initialization and maintains high recall across layers, leading to superior performance on multiple datasets.
Contribution
The paper proposes an Agent-Interpolation Initialization Module and a Hierarchical Query Fusion Decoder to improve query resilience and layer-wise object recall in 3D instance segmentation.
Findings
BFL outperforms existing methods on ScanNetV2, ScanNet200, ScanNet++, and S3DIS datasets.
The proposed modules effectively balance foreground coverage and content learning.
Deep layers maintain higher recall with the hierarchical query fusion approach.
Abstract
3D instance segmentation aims to predict a set of object instances in a scene and represent them as binary foreground masks with corresponding semantic labels. Currently, transformer-based methods are gaining increasing attention due to their elegant pipelines, reduced manual selection of geometric properties, and superior performance. However, transformer-based methods fail to simultaneously maintain strong position and content information during query initialization. Additionally, due to supervision at each decoder layer, there exists a phenomenon of object disappearance with the deepening of layers. To overcome these hurdles, we introduce Beyond the Final Layer: Hierarchical Query Fusion Transformer with Agent-Interpolation Initialization for 3D Instance Segmentation (BFL). Specifically, an Agent-Interpolation Initialization Module is designed to generate resilient queries capable of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Computer Graphics and Visualization Techniques · Industrial Vision Systems and Defect Detection
MethodsAttention Is All You Need · Label Smoothing · Byte Pair Encoding · Layer Normalization · Residual Connection · Dense Connections · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam
