Structured-Sparse Attention for Entity Tracking with Subquadratic Sequence Complexity

Hangyue Zhao; Paul Caillon; Erwan Fagnou; Alexandre Allauzen

arXiv:2605.22476·cs.LG·May 22, 2026

Structured-Sparse Attention for Entity Tracking with Subquadratic Sequence Complexity

Hangyue Zhao, Paul Caillon, Erwan Fagnou, Alexandre Allauzen

PDF

TL;DR

This paper introduces a structured-sparse attention method for entity tracking that achieves subquadratic sequence complexity, maintaining accuracy while significantly reducing computation time.

Contribution

The authors develop a blockwise evaluation of a resolvent-style operator exploiting learned attention structure, enabling efficient long-sequence entity tracking with reduced computational cost.

Findings

01

Achieves 12-29% reduction in wall-clock time compared to dense operators.

02

Matches dense operator accuracy on tracking benchmarks.

03

Up to 2.4 times faster than a compact dense Transformer.

Abstract

Entity tracking requires maintaining and updating latent states for entities and attributes over long sequences. Recent task-specific attention operators can compress deep Transformer stacks into a few layers by performing multi-hop state propagation within a single layer, but their dense evaluation remains expensive. We show that in this setting, learned attention is strongly structured: most mass concentrates in local block-diagonal neighborhoods with a light cross-block residue. Exploiting this, we derive a blockwise evaluation of a resolvent-style operator that keeps within-block interactions exact and routes cross-block interactions through a reduced system. The resulting evaluation is subquadratic in sequence length $O (n^{4/3} d)$ (and $O (n^{7/3})$ when $d \approx n$ ). On controlled tracking benchmarks, our method matches the dense operator's accuracy while reducing wall-clock time…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.