Immediate Derivatives Suffice for Online Recurrent Adaptation

Aur Shalev Merin

arXiv:2603.28750·cs.LG·April 29, 2026

Immediate Derivatives Suffice for Online Recurrent Adaptation

Aur Shalev Merin

PDF

TL;DR

This paper demonstrates that online recurrent learning can be simplified by dropping Jacobian propagation, achieving comparable performance with significantly reduced computational complexity.

Contribution

It introduces a zero-propagation method ($d=0$) for online recurrent adaptation that matches full RTRL performance with much lower memory requirements.

Findings

01

Zero-propagation ($d=0$) matches full RTRL on BCI and synthetic tasks.

02

Decomposition $g_{RTRL} = g_{imm} + g_{past}$ explains adaptation dynamics.

03

Memory savings of 1000x at $n=1024$ with no measured recovery cost.

Abstract

For three decades online recurrent learning has been assumed to require propagating a Jacobian tensor through the network's dynamics at $O (n^{4})$ per step. We show it doesn't. Dropping the propagation entirely ( $d = 0$ , $O (n^{2})$ memory) matches full RTRL within CI on held-out BCI cross-session drift (TOST equivalent within $\pm 3$ pp at $n = 20$ , Adam, float64), and across vanilla-RNN synthetic cells (sine and Lorenz under Adam and SGD) and LSTM/sine under Adam. A decomposition $g_{R T R L} = g_{imm} + g_{p a s t}$ explains why. On BCI, $g_{p a s t}$ concentrates in a single direction (top-1 singular fraction 0.62-0.74 across four optimizers, vs 0.333 for $g_{imm}$ ), and the four-optimizer full-RTRL-vs- $d = 0$ recovery gap tracks each optimizer's per-layer update-magnitude ratio $∥Δ W_{hh} ∥/∥Δ W_{o u t} ∥$ monotonically. A stationary (no-drift) control collapses both concentrations to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.