Wave-Attractor-Tree: A Hierarchical Binary Tree Reduction Architecture for Efficient Sequence Modeling
Igor Berezkin

TL;DR
This paper proposes Wave-Attractor-Tree, a hierarchical binary tree reduction architecture that replaces self-attention, achieving efficient sequence modeling with improved speed and accuracy on long-range dependencies.
Contribution
It introduces a recursive Gated Linear Unit merge operation within a binary tree structure, reducing complexity and enhancing performance over standard Transformers.
Findings
Outperforms standard Transformers in convergence speed
Achieves higher accuracy on long-range structural dependencies
Uses O(n) space and O(log n) parallel depth
Abstract
Work introduces a hierarchical binary tree-based reduction that replaces standard self-attention. The core idea is to use a recursive Gated Linear Unit merge operation, achieving O(n) total merge operations O(log n) parallel depth O(n d^2) total work and O(n) space complexity. In these experiments, the model significantly outperforms standard Transformers in both convergence speed and accuracy on long-range structural dependencies, specifically where hierarchical inductive bias is critical.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGraph Theory and Algorithms · Natural Language Processing Techniques · Parallel Computing and Optimization Techniques
