Target-Aligned Reinforcement Learning

Leonard S. Pleiss; James Harrison; Maximilian Schiffer

arXiv:2603.29501·cs.LG·May 20, 2026

Target-Aligned Reinforcement Learning

Leonard S. Pleiss, James Harrison, Maximilian Schiffer

PDF

TL;DR

TARL is a reinforcement learning refinement that improves stability and convergence by focusing updates on well-aligned target and online network estimates, leading to better performance.

Contribution

Introducing Target-Aligned Reinforcement Learning (TARL), a simple method that enhances existing algorithms by emphasizing well-aligned targets to improve stability and speed.

Findings

01

38.18% peak score gain on Atari-10

02

Consistent improvements across various environments

03

Less than 4% increase in wall-clock time

Abstract

Many value-based deep reinforcement learning algorithms rely on target networks - lagged copies of the online network - to stabilize training. While effective, this mechanism introduces a fundamental stability-recency tradeoff: slower target updates improve stability but reduce the recency of learning signals, hindering convergence speed. We propose Target-Aligned Reinforcement Learning (TARL), a simple drop-in refinement for existing algorithms that emphasizes transitions for which the target and online network estimates are highly aligned. By focusing updates on well-aligned targets, TARL mitigates the adverse effects of stale target estimates while retaining the stabilizing benefits of target networks. We empirically demonstrate consistent improvements within discrete and continuous control algorithms across various benchmark environments without any hyperparameter tuning, including…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.