WriteSAE: Sparse Autoencoders for Recurrent State

Jack Young

arXiv:2605.12770·cs.LG·May 21, 2026

WriteSAE: Sparse Autoencoders for Recurrent State

Jack Young

PDF

1 Repo 1 Models

TL;DR

WriteSAE introduces a sparse autoencoder that learns rank-1 matrix atoms to replace recurrent model writes, significantly improving token distribution accuracy and enabling cache-level steering in language models.

Contribution

The paper presents WriteSAE, a novel sparse autoencoder that directly replaces matrix updates in recurrent language models, enabling improved interpretability and control.

Findings

01

Atoms give closer final token distributions in 92.4% of positions

02

High predictive accuracy with R^2=0.98 for logit change formula

03

Generation steering increases token appearance in continuations from 33.3% to 100%

Abstract

We introduce WriteSAE, a sparse autoencoder for the matrix updates written into recurrent language-model state. In Gated DeltaNet, Mamba-2, and RWKV-7, each token writes a matrix-shaped update to a recurrent cache; a residual-stream SAE has vector-shaped atoms and cannot replace that update directly. WriteSAE learns rank-1 matrix atoms with the same shape as the model's own write. This lets us test a direct replacement: at positions where the SAE activates an atom, we remove the model's write, insert the atom scaled by its SAE activation, and continue the forward pass. The atom gives a closer final token distribution than deleting the write on 92.4% of evaluated positions; averaged per atom, the rate is 89.8%. For Gated DeltaNet, a formula using the forget gate, read query, and output embedding predicts the resulting logit change with $R^{2} = 0.98$ . The same replacement test transfers to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jackyoung27/writesae
github

Models

🤗
JackYoung27/writesae-ckpts
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.