ENA: Efficient N-dimensional Attention
Yibo Zhong

TL;DR
This paper introduces ENA, a hybrid architecture combining linear recurrence and high-order sliding window attention, providing an efficient and practical solution for modeling ultra-long high-order data sequences.
Contribution
The paper proposes ENA, a novel N-dimensional attention architecture that combines linear recurrence with high-order SWA, advancing efficient high-order data modeling.
Findings
High-order SWA is efficient both theoretically and practically.
Attention-hybrid models outperform scanning strategies in experiments.
ENA effectively models ultra-long high-order data sequences.
Abstract
Efficient modeling of long sequences of high-order data requires a more efficient architecture than Transformer. In this paper, we investigate two key aspects of extending linear recurrent models, especially those originally designed for language modeling, to high-order data (1D to ND): scanning strategies and attention-hybrid architectures. Empirical results suggest that scanning provides limited benefits, while attention-hybrid models yield promising results. Focusing on the latter, we further evaluate types of attention and find that tiled high-order sliding window attention (SWA) is efficient in both theory and practice. We term the resulting hybrid architecture of linear recurrence and high-order SWA as Efficient N-dimensional Attention (ENA). We then conduct several experiments to demonstrate its effectiveness. The intuition behind ENA is that linear recurrence compresses global…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Advanced Neural Network Applications · Brain Tumor Detection and Classification
