TL;DR
This paper introduces Sample Weight Decay, a novel method to mitigate plasticity loss in deep reinforcement learning caused by non-stationarity, by restoring gradient magnitude and improving continual learning.
Contribution
It provides a theoretical analysis of plasticity loss mechanisms and proposes Sample Weight Decay to address gradient attenuation, enhancing RL performance.
Findings
Sample Weight Decay effectively alleviates plasticity loss across multiple RL algorithms.
The method improves learning performance and achieves state-of-the-art results on DMC Humanoid tasks.
Theoretical insights link plasticity loss to NTK rank collapse and gradient decay.
Abstract
Deep reinforcement learning (RL) suffers from plasticity loss severely due to the nature of non-stationarity, which impairs the ability to adapt to new data and learn continually. Unfortunately, our understanding of how plasticity loss arises, dissipates, and can be dissolved remains limited to empirical findings, leaving the theoretical end underexplored.To address this gap, we study the plasticity loss problem from the theoretical perspective of network optimization. By formally characterizing the two culprit factors in online RL process: the non-stationarity of data distributions and the non-stationarity of targets induced by bootstrapping, our theory attributes the loss of plasticity to two mechanisms: the rank collapse of the Neural Tangent Kernel (NTK) Gram matrix and the decay of gradient magnitude. The first mechanism echoes prior empirical findings from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
