Point Cloud Learning with Transformer

Qi Zhong; Xian-Feng Han

arXiv:2104.13636·cs.CV·October 26, 2022·1 cites

Point Cloud Learning with Transformer

Qi Zhong, Xian-Feng Han

PDF

Open Access

TL;DR

This paper introduces MLMSPT, a transformer-based framework for point cloud analysis that captures multi-scale features and contextual information, achieving competitive results in 3D shape classification and segmentation.

Contribution

The paper proposes a novel multi-level multi-scale transformer architecture specifically designed for irregular point cloud data, enhancing feature representation and interaction.

Findings

01

Effective on benchmark datasets for 3D shape classification.

02

Achieves competitive performance in segmentation tasks.

03

Demonstrates the benefit of multi-scale and multi-level modeling.

Abstract

Remarkable performance from Transformer networks in Natural Language Processing promote the development of these models in dealing with computer vision tasks such as image recognition and segmentation. In this paper, we introduce a novel framework, called Multi-level Multi-scale Point Transformer (MLMSPT) that works directly on the irregular point clouds for representation learning. Specifically, a point pyramid transformer is investigated to model features with diverse resolutions or scales we defined, followed by a multi-level transformer module to aggregate contextual information from different levels of each scale and enhance their interactions. While a multi-scale transformer module is designed to capture the dependencies among representations across different scales. Extensive evaluation on public benchmark datasets demonstrate the effectiveness and the competitive performance of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · 3D Surveying and Cultural Heritage · Optical measurement and interference techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Softmax · Dropout · Layer Normalization · Byte Pair Encoding