PointTransformerX: Portable and Efficient 3D Point Cloud Processing without Sparse Algorithms
Laurenz Reichardt, Nikolas Ebert, Oliver Wasenm\"uller

TL;DR
PointTransformerX is a portable, efficient 3D point cloud processing backbone that eliminates custom CUDA operators, maintains high accuracy, and runs on diverse hardware including CPUs and AMD GPUs.
Contribution
It introduces a fully PyTorch-native transformer backbone with novel 3D positional encoding and scalable attention, enhancing portability and efficiency without sacrificing accuracy.
Findings
Achieves 98.7% of PointTransformer V3 accuracy on ScanNet
Uses 79.2% fewer parameters and is 1.6x faster
Operates natively on NVIDIA GPUs, AMD GPUs, and CPUs
Abstract
3D point cloud perception remains tightly coupled to custom CUDA operators for spatial operations, limiting portability and efficiency on non-NVIDIA, AMD, and embedded hardware. We introduce PointTransformerX (PTX), a fully PyTorch-native vision transformer backbone for 3D point clouds, removing all custom CUDA operators and external libraries while retaining competitive accuracy. PTX introduces 3D-GS-RoPE, a rotary positional embedding that encodes 3D spatial relationships directly in self-attention without neighborhood construction, and further replaces sparse convolutional patch embedding with a linear projection. PTX explores inference-time scaling of attention windows to improve accuracy without retraining. With a redesigned feed-forward network, PTX achieves 98.7\% of PointTransformer V3's accuracy on ScanNet with 79.2\% fewer parameters and executing 1.6\times faster while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
