Structured-Sparse Attention for Entity Tracking with Subquadratic Sequence Complexity
Hangyue Zhao, Paul Caillon, Erwan Fagnou, Alexandre Allauzen

TL;DR
This paper introduces a structured-sparse attention method for entity tracking that achieves subquadratic sequence complexity, maintaining accuracy while significantly reducing computation time.
Contribution
The authors develop a blockwise evaluation of a resolvent-style operator exploiting learned attention structure, enabling efficient long-sequence entity tracking with reduced computational cost.
Findings
Achieves 12-29% reduction in wall-clock time compared to dense operators.
Matches dense operator accuracy on tracking benchmarks.
Up to 2.4 times faster than a compact dense Transformer.
Abstract
Entity tracking requires maintaining and updating latent states for entities and attributes over long sequences. Recent task-specific attention operators can compress deep Transformer stacks into a few layers by performing multi-hop state propagation within a single layer, but their dense evaluation remains expensive. We show that in this setting, learned attention is strongly structured: most mass concentrates in local block-diagonal neighborhoods with a light cross-block residue. Exploiting this, we derive a blockwise evaluation of a resolvent-style operator that keeps within-block interactions exact and routes cross-block interactions through a reduced system. The resulting evaluation is subquadratic in sequence length (and when ). On controlled tracking benchmarks, our method matches the dense operator's accuracy while reducing wall-clock time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
