TL;DR
Fast Point Transformer introduces a lightweight self-attention layer with voxel hashing to efficiently process large-scale 3D point clouds, achieving significantly faster inference while maintaining competitive accuracy in 3D segmentation and detection.
Contribution
It proposes a novel lightweight self-attention mechanism with voxel hashing architecture for efficient large-scale 3D point cloud processing.
Findings
129 times faster inference than Point Transformer
Competitive accuracy in 3D semantic segmentation
Effective for 3D detection tasks
Abstract
The recent success of neural networks enables a better interpretation of 3D point clouds, but processing a large-scale 3D scene remains a challenging problem. Most current approaches divide a large-scale scene into small regions and combine the local predictions together. However, this scheme inevitably involves additional stages for pre- and post-processing and may also degrade the final output due to predictions in a local perspective. This paper introduces Fast Point Transformer that consists of a new lightweight self-attention layer. Our approach encodes continuous 3D coordinates, and the voxel hashing-based architecture boosts computational efficiency. The proposed method is demonstrated with 3D semantic segmentation and 3D detection. The accuracy of our approach is competitive to the best voxel-based method, and our network achieves 129 times faster inference time than the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Linear Layer · Dropout · Layer Normalization · Label Smoothing · Byte Pair Encoding · Multi-Head Attention · Position-Wise Feed-Forward Layer · Softmax · Adam
