Wave-Attractor-Tree: A Hierarchical Binary Tree Reduction Architecture for Efficient Sequence Modeling

Igor Berezkin

arXiv:2603.00812·cs.LG·March 3, 2026

Wave-Attractor-Tree: A Hierarchical Binary Tree Reduction Architecture for Efficient Sequence Modeling

Igor Berezkin

PDF

Open Access

TL;DR

This paper proposes Wave-Attractor-Tree, a hierarchical binary tree reduction architecture that replaces self-attention, achieving efficient sequence modeling with improved speed and accuracy on long-range dependencies.

Contribution

It introduces a recursive Gated Linear Unit merge operation within a binary tree structure, reducing complexity and enhancing performance over standard Transformers.

Findings

01

Outperforms standard Transformers in convergence speed

02

Achieves higher accuracy on long-range structural dependencies

03

Uses O(n) space and O(log n) parallel depth

Abstract

Work introduces a hierarchical binary tree-based reduction that replaces standard self-attention. The core idea is to use a recursive Gated Linear Unit merge operation, achieving O(n) total merge operations O(log n) parallel depth O(n d^2) total work and O(n) space complexity. In these experiments, the model significantly outperforms standard Transformers in both convergence speed and accuracy on long-range structural dependencies, specifically where hierarchical inductive bias is critical.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGraph Theory and Algorithms · Natural Language Processing Techniques · Parallel Computing and Optimization Techniques