MSTA3D: Multi-scale Twin-attention for 3D Instance Segmentation

Duc Dang Trung Tran; Byeongkeun Kang; Yeejin Lee

arXiv:2411.01781·cs.CV·November 12, 2024

MSTA3D: Multi-scale Twin-attention for 3D Instance Segmentation

Duc Dang Trung Tran, Byeongkeun Kang, Yeejin Lee

PDF

TL;DR

MSTA3D introduces a multi-scale twin-attention framework with spatial constraints to improve 3D instance segmentation, effectively addressing over-segmentation and mask prediction issues in transformer-based methods.

Contribution

It presents a novel multi-scale twin-attention mechanism combined with box query and regularizer, enhancing segmentation accuracy over existing transformer-based approaches.

Findings

01

Outperforms state-of-the-art methods on ScanNetV2, ScanNet200, and S3DIS datasets.

02

Effectively reduces over-segmentation in large objects.

03

Improves mask prediction reliability.

Abstract

Recently, transformer-based techniques incorporating superpoints have become prevalent in 3D instance segmentation. However, they often encounter an over-segmentation problem, especially noticeable with large objects. Additionally, unreliable mask predictions stemming from superpoint mask prediction further compound this issue. To address these challenges, we propose a novel framework called MSTA3D. It leverages multi-scale feature representation and introduces a twin-attention mechanism to effectively capture them. Furthermore, MSTA3D integrates a box query with a box regularizer, offering a complementary spatial constraint alongside semantic queries. Experimental evaluations on ScanNetV2, ScanNet200 and S3DIS datasets demonstrate that our approach surpasses state-of-the-art 3D instance segmentation methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.