ROER: Regularized Optimal Experience Replay
Changling Li, Zhang-Wei Hong, Pulkit Agrawal, Divyansh Garg, Joni, Pajarinen

TL;DR
This paper introduces ROER, a novel experience replay prioritization method based on regularized occupancy optimization, which improves reinforcement learning performance by shifting replay data distribution towards on-policy optimality.
Contribution
It proposes a new TD-error-based prioritization scheme derived from regularized occupancy optimization, demonstrating improved RL performance on continuous control benchmarks.
Findings
ROER outperforms baselines in 6 out of 11 MuJoCo and DM Control tasks.
ROER achieves significant gains in offline-to-online fine-tuning, especially in challenging environments.
The method provides a theoretical foundation linking experience prioritization to occupancy distribution shifts.
Abstract
Experience replay serves as a key component in the success of online reinforcement learning (RL). Prioritized experience replay (PER) reweights experiences by the temporal difference (TD) error empirically enhancing the performance. However, few works have explored the motivation of using TD error. In this work, we provide an alternative perspective on TD-error-based reweighting. We show the connections between the experience prioritization and occupancy optimization. By using a regularized RL objective with divergence regularizer and employing its dual form, we show that an optimal solution to the objective is obtained by shifting the distribution of off-policy data in the replay buffer towards the on-policy optimal distribution using TD-error-based occupancy ratios. Our derivation results in a new pipeline of TD error prioritization. We specifically explore the KL divergence as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCognitive Radio Networks and Spectrum Sensing · Energy Efficient Wireless Sensor Networks · Advanced Data Compression Techniques
MethodsPrioritized Experience Replay · Experience Replay
