Sliced ReLU attention: Quasi-linear contextual expressivity via sorting
Fran\c{c}ois-Xavier Vialard (LIGM), Siwan Boufad\`ene (LIGM)

TL;DR
This paper introduces sliced ReLU attention, a novel attention mechanism that uses sorting of key-query differences to achieve quasi-linear complexity while maintaining strong expressive power, suitable for long sequences.
Contribution
The paper proposes sliced ReLU attention, a new attention method with quasi-linear complexity and proven expressive capabilities, differing structurally from softmax-based approaches.
Findings
Achieves O(n log(n)) complexity via sorting.
Maintains sequence-to-sequence disentangling ability.
Satisfies a universal approximation property.
Abstract
We introduce sliced ReLU attention, a new attention mechanism that departs structurally from both softmax and its approximation alternatives. Instead of applying a nonlinearity to pairwise dot products, we operate on one-dimensional projections of key--query differences and leverage sorting to obtain quasi-linear complexity. This construction yields a differentiable, non-symmetric kernel that can be computed in O(n log(n)) through a sorting procedure, making it suitable for very long contexts. Beyond computational benefits, the model retains strong theoretical expressive power: we establish two in-context expressivity results, previously known for softmax attention, showing that sliced ReLU attention preserves the ability to perform nontrivial sequence-to-sequence disentangling tasks and satisfies a contextual universal approximation property. Finally, we illustrate the potential…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Advanced Graph Neural Networks
