ROER: Regularized Optimal Experience Replay

Changling Li; Zhang-Wei Hong; Pulkit Agrawal; Divyansh Garg; Joni; Pajarinen

arXiv:2407.03995·cs.LG·September 20, 2024

ROER: Regularized Optimal Experience Replay

Changling Li, Zhang-Wei Hong, Pulkit Agrawal, Divyansh Garg, Joni, Pajarinen

PDF

Open Access 1 Repo

TL;DR

This paper introduces ROER, a novel experience replay prioritization method based on regularized occupancy optimization, which improves reinforcement learning performance by shifting replay data distribution towards on-policy optimality.

Contribution

It proposes a new TD-error-based prioritization scheme derived from regularized occupancy optimization, demonstrating improved RL performance on continuous control benchmarks.

Findings

01

ROER outperforms baselines in 6 out of 11 MuJoCo and DM Control tasks.

02

ROER achieves significant gains in offline-to-online fine-tuning, especially in challenging environments.

03

The method provides a theoretical foundation linking experience prioritization to occupancy distribution shifts.

Abstract

Experience replay serves as a key component in the success of online reinforcement learning (RL). Prioritized experience replay (PER) reweights experiences by the temporal difference (TD) error empirically enhancing the performance. However, few works have explored the motivation of using TD error. In this work, we provide an alternative perspective on TD-error-based reweighting. We show the connections between the experience prioritization and occupancy optimization. By using a regularized RL objective with $f -$ divergence regularizer and employing its dual form, we show that an optimal solution to the objective is obtained by shifting the distribution of off-policy data in the replay buffer towards the on-policy optimal distribution using TD-error-based occupancy ratios. Our derivation results in a new pipeline of TD error prioritization. We specifically explore the KL divergence as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xavierchanglingli/regularized-optimal-experience-replay
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCognitive Radio Networks and Spectrum Sensing · Energy Efficient Wireless Sensor Networks · Advanced Data Compression Techniques

MethodsPrioritized Experience Replay · Experience Replay