MSTA3D: Multi-scale Twin-attention for 3D Instance Segmentation
Duc Dang Trung Tran, Byeongkeun Kang, Yeejin Lee

TL;DR
MSTA3D introduces a multi-scale twin-attention framework with spatial constraints to improve 3D instance segmentation, effectively addressing over-segmentation and mask prediction issues in transformer-based methods.
Contribution
It presents a novel multi-scale twin-attention mechanism combined with box query and regularizer, enhancing segmentation accuracy over existing transformer-based approaches.
Findings
Outperforms state-of-the-art methods on ScanNetV2, ScanNet200, and S3DIS datasets.
Effectively reduces over-segmentation in large objects.
Improves mask prediction reliability.
Abstract
Recently, transformer-based techniques incorporating superpoints have become prevalent in 3D instance segmentation. However, they often encounter an over-segmentation problem, especially noticeable with large objects. Additionally, unreliable mask predictions stemming from superpoint mask prediction further compound this issue. To address these challenges, we propose a novel framework called MSTA3D. It leverages multi-scale feature representation and introduces a twin-attention mechanism to effectively capture them. Furthermore, MSTA3D integrates a box query with a box regularizer, offering a complementary spatial constraint alongside semantic queries. Experimental evaluations on ScanNetV2, ScanNet200 and S3DIS datasets demonstrate that our approach surpasses state-of-the-art 3D instance segmentation methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
