Point Transformer V2: Grouped Vector Attention and Partition-based   Pooling

Xiaoyang Wu; Yixing Lao; Li Jiang; Xihui Liu; Hengshuang Zhao

arXiv:2210.05666·cs.CV·October 13, 2022·190 cites

Point Transformer V2: Grouped Vector Attention and Partition-based Pooling

Xiaoyang Wu, Yixing Lao, Li Jiang, Xihui Liu, Hengshuang Zhao

PDF

Open Access 2 Repos 1 Video

TL;DR

Point Transformer V2 introduces grouped vector attention and partition-based pooling to enhance 3D point cloud understanding, achieving state-of-the-art results on multiple benchmarks with improved efficiency and effectiveness.

Contribution

The paper proposes novel grouped vector attention and partition-based pooling methods that improve upon previous transformer models for 3D point cloud tasks.

Findings

01

Achieves state-of-the-art results on ScanNet v2 and S3DIS segmentation benchmarks.

02

Outperforms previous models on ModelNet40 classification.

03

Demonstrates improved efficiency and spatial alignment in point cloud processing.

Abstract

As a pioneering work exploring transformer architecture for 3D point cloud understanding, Point Transformer achieves impressive results on multiple highly competitive benchmarks. In this work, we analyze the limitations of the Point Transformer and propose our powerful and efficient Point Transformer V2 model with novel designs that overcome the limitations of previous work. In particular, we first propose group vector attention, which is more effective than the previous version of vector attention. Inheriting the advantages of both learnable weight encoding and multi-head attention, we present a highly effective implementation of grouped vector attention with a novel grouped weight encoding layer. We also strengthen the position information for attention by an additional position encoding multiplier. Furthermore, we design novel and lightweight partition-based pooling methods which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Point Transformer V2: Grouped Vector Attention and Partition-based Pooling· slideslive

Taxonomy

TopicsRemote Sensing and LiDAR Applications · 3D Surveying and Cultural Heritage · 3D Shape Modeling and Analysis

MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Residual Connection · Dropout · Adam · Dense Connections · Softmax · Label Smoothing · Multi-Head Attention