Transformers in 3D Point Clouds: A Survey
Dening Lu, Qian Xie, Mingqiang Wei, Kyle Gao, Linlin Xu, Jonathan Li

TL;DR
This survey comprehensively reviews the application of Transformer architectures in 3D point cloud analysis, covering theory, taxonomy, variants, and performance comparisons across tasks, highlighting current challenges and future directions.
Contribution
It provides the first systematic survey of Transformer methods in 3D point cloud processing, including taxonomy, analysis of variants, and performance evaluation across multiple tasks.
Findings
Transformers show superior performance in 3D point cloud tasks.
Variants of self-attention improve efficiency and effectiveness.
Transformers outperform traditional methods in classification, segmentation, and detection.
Abstract
Transformers have been at the heart of the Natural Language Processing (NLP) and Computer Vision (CV) revolutions. The significant success in NLP and CV inspired exploring the use of Transformers in point cloud processing. However, how do Transformers cope with the irregularity and unordered nature of point clouds? How suitable are Transformers for different 3D representations (e.g., point- or voxel-based)? How competent are Transformers for various 3D processing tasks? As of now, there is still no systematic survey of the research on these issues. For the first time, we provided a comprehensive overview of increasingly popular Transformers for 3D point cloud analysis. We start by introducing the theory of the Transformer architecture and reviewing its applications in 2D/3D fields. Then, we present three different taxonomies (i.e., implementation-, data representation-, and task-based),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · 3D Surveying and Cultural Heritage · 3D Shape Modeling and Analysis
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Adam · Byte Pair Encoding · Residual Connection · Label Smoothing · Position-Wise Feed-Forward Layer · Absolute Position Encodings
