3D Learnable Supertoken Transformer for LiDAR Point Cloud Scene Segmentation
Dening Lu, Jun Zhou, Kyle Gao, Linlin Xu, Jonathan Li

TL;DR
This paper introduces 3DLST, a novel 3D Transformer framework for LiDAR point cloud scene segmentation that employs learnable supertokens and a W-net architecture, achieving state-of-the-art accuracy and efficiency.
Contribution
The paper proposes the first dynamic supertoken optimization block and a cross-attention-guided upsampling method, enhancing efficiency and semantic clustering in 3D point cloud segmentation.
Findings
Achieves state-of-the-art performance on multiple LiDAR datasets.
Up to 5x faster than previous methods.
Demonstrates strong adaptability across various LiDAR data types.
Abstract
3D Transformers have achieved great success in point cloud understanding and representation. However, there is still considerable scope for further development in effective and efficient Transformers for large-scale LiDAR point cloud scene segmentation. This paper proposes a novel 3D Transformer framework, named 3D Learnable Supertoken Transformer (3DLST). The key contributions are summarized as follows. Firstly, we introduce the first Dynamic Supertoken Optimization (DSO) block for efficient token clustering and aggregating, where the learnable supertoken definition avoids the time-consuming pre-processing of traditional superpoint generation. Since the learnable supertokens can be dynamically optimized by multi-level deep features during network learning, they are tailored to the semantic homogeneity-aware token clustering. Secondly, an efficient Cross-Attention-guided Upsampling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptical Systems and Laser Technology · Remote Sensing and LiDAR Applications · 3D Surveying and Cultural Heritage
MethodsAttention Is All You Need · Concatenated Skip Connection · Max Pooling · Convolution · Byte Pair Encoding · Linear Layer · Absolute Position Encodings · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Softmax
