Massive Redundancy in Gradient Transport Enables Sparse Online Learning
Aur Shalev Merin

TL;DR
This paper demonstrates that in recurrent neural networks, the Jacobian matrix is highly redundant, allowing sparse gradient propagation to approximate full RTRL efficiently and stably across various models and real data.
Contribution
The study reveals the massive redundancy in the Jacobian, enabling sparse RTRL to perform comparably to full RTRL with significantly reduced computational cost across multiple neural architectures.
Findings
Sparse propagation recovers over 80% of full RTRL performance with only 6% of paths in RNNs.
Sparse RTRL remains effective and stable from networks of size 64 to 256.
Sparse gradient transport outperforms dense methods in chaotic dynamics and real neural data.
Abstract
Real-time recurrent learning (RTRL) computes exact online gradients by propagating a Jacobian tensor forward through recurrent dynamics, but at O(n^4) cost per step. Prior work has sought structured approximations (rank-1 compression, graph-based sparsity, Kronecker factorization). We show that, in the continuous error signal regime, the recurrent Jacobian is massively redundant:propagating through a random 6% of paths (k=4 of n=64) recovers 84 +/- 6% of full RTRL's adaptation ability across five seeds, and the absolute count k=4 remains effective from n=64 to n=256 (6% to 1.6%, recovery 84 to 78%), meaning sparse RTRL becomes relatively cheaper as networks grow. In RNNs, the recovery is selection-invariant (even adversarial path selection works) and exhibits a step-function transition from zero to any nonzero propagation. Spectral analysis reveals the mechanism: the Jacobian is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
