Don't Pay Attention

Mohammad Hammoud; Devang Acharya

arXiv:2506.11305·cs.CL·November 18, 2025

Don't Pay Attention

Mohammad Hammoud, Devang Acharya

PDF

Open Access 10 Models

TL;DR

Avey is a novel neural architecture that efficiently processes arbitrarily long sequences by selecting relevant tokens, outperforming Transformers especially on long-range dependency tasks, while maintaining competitive short-range performance.

Contribution

Avey introduces a new architecture that decouples sequence length from context, enabling efficient long-range processing without attention or recurrence.

Findings

01

Avey outperforms Transformers on long-range dependency tasks.

02

Avey matches Transformer performance on short-range NLP benchmarks.

03

Avey processes arbitrarily long sequences efficiently.

Abstract

The Transformer has become the de facto standard for modern language models owing to its parallelizable training and effective autoregressive decoding. However, its fixed context window and the quadratic time and memory costs of its self-attention mechanism remain central bottlenecks. These constraints have revived interest in recurrent architectures that scale linearly with sequence length, but at the cost of reduced parallelism. In this paper, we introduce Avey, a new foundational architecture that breaks away from both attention and recurrence. Avey pairs a ranker with an autoregressive neural processor to select and contextualize only the most relevant tokens for any given token. Specifically, it decouples sequence length from context width, thus enabling effective and efficient processing of arbitrarily long sequences. Results show that Avey compares favorably to the Transformer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis

MethodsDropout · Dense Connections · Absolute Position Encodings · Layer Normalization · Byte Pair Encoding · Label Smoothing · Softmax · Transformer