Positional LSH: Binary Block Matrix Approximation for Attention with Linear Biases

Daniel Wolfson; Tal Wagner

arXiv:2605.09472·cs.LG·May 12, 2026

Positional LSH: Binary Block Matrix Approximation for Attention with Linear Biases

Daniel Wolfson, Tal Wagner

PDF

TL;DR

This paper introduces a formal framework connecting positional bias, masks, and embeddings in transformers, proposing a method for efficient ALiBi-biased attention with theoretical guarantees and empirical validation.

Contribution

It presents a novel positional LSH scheme that approximates ALiBi bias matrices, enabling near-linear time attention with provable accuracy guarantees.

Findings

01

Spectral and max-norm approximation guarantees for the binary mask scheme.

02

High-probability uniform approximation of ALiBi-biased attention outputs.

03

Empirical validation on large language models supports theoretical results.

Abstract

Positional encoding in transformers is commonly implemented through positional embeddings, attention masks, or bias terms, but formal connections between these mechanisms remain limited. We study attention with positional bias through the lens of locality-sensitive hashing (LSH), focusing on Attention with Linear Biases (ALiBi). We show that the ALiBi bias matrix is the expectation of contiguous block-diagonal binary masks induced by a ``positional LSH'' scheme. The empirical mean of masks sampled from this scheme yields spectral norm and max-norm approximation guarantees with bounded block sizes with high probability. This structural theorem implies a uniform approximation theorem for ALiBi-biased attention: with high probability over the sampled masks, the approximate attention output is accurate simultaneously for all query-key-value inputs and can be computed in near-linear time in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.