Boosting Soft Actor-Critic: Emphasizing Recent Experience without   Forgetting the Past

Che Wang; Keith Ross

arXiv:1906.04009·cs.LG·June 11, 2019·40 cites

Boosting Soft Actor-Critic: Emphasizing Recent Experience without Forgetting the Past

Che Wang, Keith Ross

PDF

Open Access 3 Repos

TL;DR

This paper introduces Emphasizing Recent Experience (ERE), a novel sampling technique for Soft Actor-Critic (SAC) that improves sample efficiency by focusing on recent data without losing past information, outperforming standard SAC.

Contribution

The paper proposes ERE, a new off-policy sampling method for SAC, and demonstrates its effectiveness and synergy with Priority Experience Replay in continuous control tasks.

Findings

01

ERE improves sample efficiency over vanilla SAC.

02

Combining ERE with PER yields further performance gains.

03

SAC+ERE outperforms other variants on Mujoco benchmarks.

Abstract

Soft Actor-Critic (SAC) is an off-policy actor-critic deep reinforcement learning (DRL) algorithm based on maximum entropy reinforcement learning. By combining off-policy updates with an actor-critic formulation, SAC achieves state-of-the-art performance on a range of continuous-action benchmark tasks, outperforming prior on-policy and off-policy methods. The off-policy method employed by SAC samples data uniformly from past experience when performing parameter updates. We propose Emphasizing Recent Experience (ERE), a simple but powerful off-policy sampling technique, which emphasizes recently observed data while not forgetting the past. The ERE algorithm samples more aggressively from recent experience, and also orders the updates to ensure that updates from old data do not overwrite updates from new data. We compare vanilla SAC and SAC+ERE, and show that ERE is more sample efficient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control

MethodsExperience Replay · Q-Learning