Immediate Derivatives Suffice for Online Recurrent Adaptation
Aur Shalev Merin

TL;DR
This paper demonstrates that online recurrent learning can be simplified by dropping Jacobian propagation, achieving comparable performance with significantly reduced computational complexity.
Contribution
It introduces a zero-propagation method ($d=0$) for online recurrent adaptation that matches full RTRL performance with much lower memory requirements.
Findings
Zero-propagation ($d=0$) matches full RTRL on BCI and synthetic tasks.
Decomposition $g_{RTRL} = g_{imm} + g_{past}$ explains adaptation dynamics.
Memory savings of 1000x at $n=1024$ with no measured recovery cost.
Abstract
For three decades online recurrent learning has been assumed to require propagating a Jacobian tensor through the network's dynamics at per step. We show it doesn't. Dropping the propagation entirely (, memory) matches full RTRL within CI on held-out BCI cross-session drift (TOST equivalent within pp at , Adam, float64), and across vanilla-RNN synthetic cells (sine and Lorenz under Adam and SGD) and LSTM/sine under Adam. A decomposition explains why. On BCI, concentrates in a single direction (top-1 singular fraction 0.62-0.74 across four optimizers, vs 0.333 for ), and the four-optimizer full-RTRL-vs- recovery gap tracks each optimizer's per-layer update-magnitude ratio monotonically. A stationary (no-drift) control collapses both concentrations to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
