Sliced ReLU attention: Quasi-linear contextual expressivity via sorting

Fran\c{c}ois-Xavier Vialard (LIGM); Siwan Boufad\`ene (LIGM)

arXiv:2512.11411·cs.LG·February 5, 2026

Sliced ReLU attention: Quasi-linear contextual expressivity via sorting

Fran\c{c}ois-Xavier Vialard (LIGM), Siwan Boufad\`ene (LIGM)

PDF

Open Access

TL;DR

This paper introduces sliced ReLU attention, a novel attention mechanism that uses sorting of key-query differences to achieve quasi-linear complexity while maintaining strong expressive power, suitable for long sequences.

Contribution

The paper proposes sliced ReLU attention, a new attention method with quasi-linear complexity and proven expressive capabilities, differing structurally from softmax-based approaches.

Findings

01

Achieves O(n log(n)) complexity via sorting.

02

Maintains sequence-to-sequence disentangling ability.

03

Satisfies a universal approximation property.

Abstract

We introduce sliced ReLU attention, a new attention mechanism that departs structurally from both softmax and its approximation alternatives. Instead of applying a nonlinearity to pairwise dot products, we operate on one-dimensional projections of key--query differences and leverage sorting to obtain quasi-linear complexity. This construction yields a differentiable, non-symmetric kernel that can be computed in O(n log(n)) through a sorting procedure, making it suitable for very long contexts. Beyond computational benefits, the model retains strong theoretical expressive power: we establish two in-context expressivity results, previously known for softmax attention, showing that sliced ReLU attention preserves the ability to perform nontrivial sequence-to-sequence disentangling tasks and satisfies a contextual universal approximation property. Finally, we illustrate the potential…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Advanced Graph Neural Networks