Adaptive Memory Decay for Log-Linear Attention

Yaxita Amin; Helen Zichen Li; Mengfan Zhang; Samet Ayhan

arXiv:2605.06946·cs.LG·May 11, 2026

Adaptive Memory Decay for Log-Linear Attention

Yaxita Amin, Helen Zichen Li, Mengfan Zhang, Samet Ayhan

PDF

TL;DR

This paper introduces a method to adaptively learn memory decay parameters in log-linear attention models, enhancing their ability to recall relevant information in long sequences.

Contribution

It proposes a lightweight, input-dependent decay mechanism for log-linear attention that improves long-range memory recall without increasing complexity.

Findings

01

Input-dependent decay outperforms fixed decay in associative recall and language modeling.

02

Largest improvements occur in long-range memory tasks where fixed decay fails.

03

The method preserves log-linear complexity with negligible parameter overhead.

Abstract

Sequence models face a fundamental tradeoff between memory capacity and computational efficiency. Transformers achieve expressive context modeling at quadratic cost, while linear attention and state-space models run in linear time by compressing context into a fixed-size hidden state, inherently limiting recall. Log-linear attention navigates this tradeoff by organizing memory across a Fenwick tree hierarchy, growing its hidden state logarithmically with sequence length at log-linear compute cost. However, its memory decay parameter {\lambda} is fixed and independent of the input, assigning uniform weights across all hierarchy levels regardless of the content, which introduces unnecessary rigidity. We propose learning {\lambda} directly from the input via a lightweight two-layer MLP, producing per-token, per-level decay that adapts to content rather than position. A softplus activation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.