Point Cloud Learning with Transformer
Qi Zhong, Xian-Feng Han

TL;DR
This paper introduces MLMSPT, a transformer-based framework for point cloud analysis that captures multi-scale features and contextual information, achieving competitive results in 3D shape classification and segmentation.
Contribution
The paper proposes a novel multi-level multi-scale transformer architecture specifically designed for irregular point cloud data, enhancing feature representation and interaction.
Findings
Effective on benchmark datasets for 3D shape classification.
Achieves competitive performance in segmentation tasks.
Demonstrates the benefit of multi-scale and multi-level modeling.
Abstract
Remarkable performance from Transformer networks in Natural Language Processing promote the development of these models in dealing with computer vision tasks such as image recognition and segmentation. In this paper, we introduce a novel framework, called Multi-level Multi-scale Point Transformer (MLMSPT) that works directly on the irregular point clouds for representation learning. Specifically, a point pyramid transformer is investigated to model features with diverse resolutions or scales we defined, followed by a multi-level transformer module to aggregate contextual information from different levels of each scale and enhance their interactions. While a multi-scale transformer module is designed to capture the dependencies among representations across different scales. Extensive evaluation on public benchmark datasets demonstrate the effectiveness and the competitive performance of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · 3D Surveying and Cultural Heritage · Optical measurement and interference techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Softmax · Dropout · Layer Normalization · Byte Pair Encoding
