Flash3D: Super-scaling Point Transformers through Joint Hardware-Geometry Locality
Liyan Chen, Gregory P. Meyer, Zaiwei Zhang, Eric M. Wolff, Paul, Vernaza

TL;DR
Flash3D Transformer unifies geometric locality and GPU architecture to enable super-scaling of point cloud models, achieving significant speed and memory efficiency improvements over previous methods.
Contribution
Introduces a novel locality mechanism based on Perfect Spatial Hashing that aligns geometric locality with GPU tiling, enabling scalable and efficient point cloud transformers.
Findings
2.25x speed increase over state-of-the-art PTv3
2.4x memory efficiency boost
Higher task accuracies at same compute budget
Abstract
Recent efforts recognize the power of scale in 3D learning (e.g. PTv3) and attention mechanisms (e.g. FlashAttention). However, current point cloud backbones fail to holistically unify geometric locality, attention mechanisms, and GPU architectures in one view. In this paper, we introduce Flash3D Transformer, which aligns geometric locality and GPU tiling through a principled locality mechanism based on Perfect Spatial Hashing (PSH). The common alignment with GPU tiling naturally fuses our PSH locality mechanism with FlashAttention at negligible extra cost. This mechanism affords flexible design choices throughout the backbone that result in superior downstream task results. Flash3D outperforms state-of-the-art PTv3 results on benchmark datasets, delivering a 2.25x speed increase and 2.4x memory efficiency boost. This efficiency enables scaling to wider attention scopes and larger…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptical measurement and interference techniques · Surface Roughness and Optical Measurements · Advanced Optical Imaging Technologies
MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Absolute Position Encodings · Dense Connections · Multi-Head Attention · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection
