Point Transformer V2: Grouped Vector Attention and Partition-based Pooling
Xiaoyang Wu, Yixing Lao, Li Jiang, Xihui Liu, Hengshuang Zhao

TL;DR
Point Transformer V2 introduces grouped vector attention and partition-based pooling to enhance 3D point cloud understanding, achieving state-of-the-art results on multiple benchmarks with improved efficiency and effectiveness.
Contribution
The paper proposes novel grouped vector attention and partition-based pooling methods that improve upon previous transformer models for 3D point cloud tasks.
Findings
Achieves state-of-the-art results on ScanNet v2 and S3DIS segmentation benchmarks.
Outperforms previous models on ModelNet40 classification.
Demonstrates improved efficiency and spatial alignment in point cloud processing.
Abstract
As a pioneering work exploring transformer architecture for 3D point cloud understanding, Point Transformer achieves impressive results on multiple highly competitive benchmarks. In this work, we analyze the limitations of the Point Transformer and propose our powerful and efficient Point Transformer V2 model with novel designs that overcome the limitations of previous work. In particular, we first propose group vector attention, which is more effective than the previous version of vector attention. Inheriting the advantages of both learnable weight encoding and multi-head attention, we present a highly effective implementation of grouped vector attention with a novel grouped weight encoding layer. We also strengthen the position information for attention by an additional position encoding multiplier. Furthermore, we design novel and lightweight partition-based pooling methods which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsRemote Sensing and LiDAR Applications · 3D Surveying and Cultural Heritage · 3D Shape Modeling and Analysis
MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Residual Connection · Dropout · Adam · Dense Connections · Softmax · Label Smoothing · Multi-Head Attention
