Higher-order Linear Attention

Yifan Zhang; Zhen Qin; Mengdi Wang; Quanquan Gu

arXiv:2510.27258·cs.LG·May 15, 2026

Higher-order Linear Attention

Yifan Zhang, Zhen Qin, Mengdi Wang, Quanquan Gu

PDF

1 Repo

TL;DR

Higher-order Linear Attention (HLA) offers a scalable, causal attention mechanism that captures complex interactions efficiently, enabling long-context language modeling without quadratic costs.

Contribution

HLA introduces a novel higher-order, linear-time attention mechanism with closed-form identities and exact chunk-parallel training, extending the expressivity of scalable attention models.

Findings

01

HLA maintains constant-size state with linear per-token computation.

02

A chunk-parallel training scheme reproduces serial recurrence activations.

03

Extensions to third and higher orders are outlined.

Abstract

The quadratic cost of scaled dot-product attention is a central obstacle to scaling autoregressive language models to long contexts. Linear-time attention and State Space Models (SSMs) provide scalable alternatives but are typically restricted to first-order or kernel-based approximations, which can limit expressivity. We introduce Higher-order Linear Attention (HLA), a causal, streaming mechanism that realizes higher interactions via compact prefix sufficient statistics. In the second-order case, HLA maintains a constant-size state and computes per-token outputs in linear time without materializing any $n \times n$ matrices. We give closed-form streaming identities, a strictly causal masked variant using two additional summaries, and a chunk-parallel training scheme based on associative scans that reproduces the activations of a serial recurrence exactly. We further outline extensions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yifanzhang-pro/HLA
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.