FAST3DIS: Feed-forward Anchored Scene Transformer for 3D Instance Segmentation

Changyang Li; Xueqing Huang; Shin-Fang Chng; Huangying Zhan; Qingan Yan; Yi Xu

arXiv:2603.25993·cs.CV·March 30, 2026

FAST3DIS: Feed-forward Anchored Scene Transformer for 3D Instance Segmentation

Changyang Li, Xueqing Huang, Shin-Fang Chng, Huangying Zhan, Qingan Yan, Yi Xu

PDF

TL;DR

FAST3DIS is an end-to-end 3D instance segmentation method using a Transformer architecture that avoids clustering, improves efficiency, and maintains geometric priors for better scene understanding.

Contribution

The paper introduces a novel query-based Transformer architecture with 3D anchors and contrastive learning for efficient, accurate 3D instance segmentation without post-hoc clustering.

Findings

01

Achieves competitive accuracy on indoor 3D datasets.

02

Offers improved memory scalability and inference speed.

03

Effectively prevents query collisions with dual-level regularization.

Abstract

While recent feed-forward 3D reconstruction models provide a strong geometric foundation for scene understanding, extending them to 3D instance segmentation typically relies on a disjointed "lift-and-cluster" paradigm. Grouping dense pixel-wise embeddings via non-differentiable clustering scales poorly with the number of views and disconnects representation learning from the final segmentation objective. In this paper, we present a Feed-forward Anchored Scene Transformer for 3D Instance Segmentation (FAST3DIS), an end-to-end approach that effectively bypasses post-hoc clustering. We introduce a 3D-anchored, query-based Transformer architecture built upon a foundational depth backbone, adapted efficiently to learn instance-specific semantics while retaining its zero-shot geometric priors. We formulate a learned 3D anchor generator coupled with an anchor-sampling cross-attention mechanism…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.