RelFlexformer: Efficient Attention 3D-Transformers for Integrable Relative Positional Encodings
Byeongchan Kim, Arijit Sehanobish, Avinava Dubey, Min-hwan Oh, Krzysztof Choromanski

TL;DR
RelFlexformer introduces a flexible, efficient 3D-Transformer model utilizing universal relative positional encodings with $O(L \, \log L)$ complexity, suitable for non-structured 3D data like point clouds.
Contribution
It generalizes efficient RPE-attention methods using NU-FFT theory, enabling application to arbitrary 3D token distributions and improving performance on 3D datasets.
Findings
Achieves $O(L \log L)$ attention computation time.
Extensive experiments show quality improvements on 3D datasets.
Generalizes existing RPE methods to non-structured 3D data.
Abstract
We present a new class of efficient attention mechanisms applying universal 3D Relative Positional Encoding (RPE) methods given by arbitrary integrable modulation functions . They lead to the new class of 3D-Transformer models, called \textit{RelFlexformers}, flexibly integrating those RPEs, and characterized by the time complexity of the attention computation for the -length input sequences. RelFlexformers builds on the theory of the Non-Uniform Fourier Transform (NU-FFT), naturally generalizing several existing efficient RPE-attention methods from structured settings with tokens homogeneously embedded in unweighted grids into general non-structured heterogeneous scenarios, where tokens' positions are arbitrarily distributed in the corresponding 3D spaces. As such, RelFlexformers can be applied in particular to model point clouds. Our extensive empirical evaluation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
