Target-Aligned Reinforcement Learning
Leonard S. Pleiss, James Harrison, Maximilian Schiffer

TL;DR
TARL is a reinforcement learning refinement that improves stability and convergence by focusing updates on well-aligned target and online network estimates, leading to better performance.
Contribution
Introducing Target-Aligned Reinforcement Learning (TARL), a simple method that enhances existing algorithms by emphasizing well-aligned targets to improve stability and speed.
Findings
38.18% peak score gain on Atari-10
Consistent improvements across various environments
Less than 4% increase in wall-clock time
Abstract
Many value-based deep reinforcement learning algorithms rely on target networks - lagged copies of the online network - to stabilize training. While effective, this mechanism introduces a fundamental stability-recency tradeoff: slower target updates improve stability but reduce the recency of learning signals, hindering convergence speed. We propose Target-Aligned Reinforcement Learning (TARL), a simple drop-in refinement for existing algorithms that emphasizes transitions for which the target and online network estimates are highly aligned. By focusing updates on well-aligned targets, TARL mitigates the adverse effects of stale target estimates while retaining the stabilizing benefits of target networks. We empirically demonstrate consistent improvements within discrete and continuous control algorithms across various benchmark environments without any hyperparameter tuning, including…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
