Flash3D: Super-scaling Point Transformers through Joint   Hardware-Geometry Locality

Liyan Chen; Gregory P. Meyer; Zaiwei Zhang; Eric M. Wolff; Paul; Vernaza

arXiv:2412.16481·cs.CV·December 24, 2024

Flash3D: Super-scaling Point Transformers through Joint Hardware-Geometry Locality

Liyan Chen, Gregory P. Meyer, Zaiwei Zhang, Eric M. Wolff, Paul, Vernaza

PDF

Open Access 1 Repo

TL;DR

Flash3D Transformer unifies geometric locality and GPU architecture to enable super-scaling of point cloud models, achieving significant speed and memory efficiency improvements over previous methods.

Contribution

Introduces a novel locality mechanism based on Perfect Spatial Hashing that aligns geometric locality with GPU tiling, enabling scalable and efficient point cloud transformers.

Findings

01

2.25x speed increase over state-of-the-art PTv3

02

2.4x memory efficiency boost

03

Higher task accuracies at same compute budget

Abstract

Recent efforts recognize the power of scale in 3D learning (e.g. PTv3) and attention mechanisms (e.g. FlashAttention). However, current point cloud backbones fail to holistically unify geometric locality, attention mechanisms, and GPU architectures in one view. In this paper, we introduce Flash3D Transformer, which aligns geometric locality and GPU tiling through a principled locality mechanism based on Perfect Spatial Hashing (PSH). The common alignment with GPU tiling naturally fuses our PSH locality mechanism with FlashAttention at negligible extra cost. This mechanism affords flexible design choices throughout the backbone that result in superior downstream task results. Flash3D outperforms state-of-the-art PTv3 results on benchmark datasets, delivering a 2.25x speed increase and 2.4x memory efficiency boost. This efficiency enables scaling to wider attention scopes and larger…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liyanc/flash3dtransformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOptical measurement and interference techniques · Surface Roughness and Optical Measurements · Advanced Optical Imaging Technologies

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Absolute Position Encodings · Dense Connections · Multi-Head Attention · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection