3D Learnable Supertoken Transformer for LiDAR Point Cloud Scene   Segmentation

Dening Lu; Jun Zhou; Kyle Gao; Linlin Xu; Jonathan Li

arXiv:2405.15826·cs.CV·December 30, 2024·2 cites

3D Learnable Supertoken Transformer for LiDAR Point Cloud Scene Segmentation

Dening Lu, Jun Zhou, Kyle Gao, Linlin Xu, Jonathan Li

PDF

Open Access

TL;DR

This paper introduces 3DLST, a novel 3D Transformer framework for LiDAR point cloud scene segmentation that employs learnable supertokens and a W-net architecture, achieving state-of-the-art accuracy and efficiency.

Contribution

The paper proposes the first dynamic supertoken optimization block and a cross-attention-guided upsampling method, enhancing efficiency and semantic clustering in 3D point cloud segmentation.

Findings

01

Achieves state-of-the-art performance on multiple LiDAR datasets.

02

Up to 5x faster than previous methods.

03

Demonstrates strong adaptability across various LiDAR data types.

Abstract

3D Transformers have achieved great success in point cloud understanding and representation. However, there is still considerable scope for further development in effective and efficient Transformers for large-scale LiDAR point cloud scene segmentation. This paper proposes a novel 3D Transformer framework, named 3D Learnable Supertoken Transformer (3DLST). The key contributions are summarized as follows. Firstly, we introduce the first Dynamic Supertoken Optimization (DSO) block for efficient token clustering and aggregating, where the learnable supertoken definition avoids the time-consuming pre-processing of traditional superpoint generation. Since the learnable supertokens can be dynamically optimized by multi-level deep features during network learning, they are tailored to the semantic homogeneity-aware token clustering. Secondly, an efficient Cross-Attention-guided Upsampling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOptical Systems and Laser Technology · Remote Sensing and LiDAR Applications · 3D Surveying and Cultural Heritage

MethodsAttention Is All You Need · Concatenated Skip Connection · Max Pooling · Convolution · Byte Pair Encoding · Linear Layer · Absolute Position Encodings · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Softmax