3DCTN: 3D Convolution-Transformer Network for Point Cloud Classification
Dening Lu, Qian Xie, Linlin Xu, Jonathan Li

TL;DR
This paper introduces 3DCTN, a hierarchical network combining convolution and Transformer for efficient and accurate point cloud classification, achieving state-of-the-art results on ModelNet40.
Contribution
The paper proposes a novel hierarchical framework that integrates convolution and Transformer modules for improved point cloud classification.
Findings
Achieves state-of-the-art accuracy on ModelNet40
Demonstrates improved efficiency over pure Transformer models
Provides insights into Transformer variants for 3D data
Abstract
Although accurate and fast point cloud classification is a fundamental task in 3D applications, it is difficult to achieve this purpose due to the irregularity and disorder of point clouds that make it challenging to achieve effective and efficient global discriminative feature learning. Lately, 3D Transformers have been adopted to improve point cloud processing. Nevertheless, massive Transformer layers tend to incur huge computational and memory costs. This paper presents a novel hierarchical framework that incorporates convolution with Transformer for point cloud classification, named 3D Convolution-Transformer Network (3DCTN), to combine the strong and efficient local feature learning ability of convolution with the remarkable global context modeling capability of Transformer. Our method has two main modules operating on the downsampling point sets, and each module consists of a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Surveying and Cultural Heritage · 3D Shape Modeling and Analysis · Optical measurement and interference techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Softmax · Absolute Position Encodings · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Residual Connection · Layer Normalization
