Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement   Learning with a Stochastic Actor

Tuomas Haarnoja; Aurick Zhou; Pieter Abbeel; Sergey Levine

arXiv:1801.01290·cs.LG·August 10, 2018·3.5k cites

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine

PDF

Open Access 5 Repos 4 Models

TL;DR

The paper introduces the soft actor-critic algorithm, an off-policy deep reinforcement learning method based on maximum entropy principles, which improves sample efficiency, stability, and performance on continuous control tasks.

Contribution

It presents a novel stochastic actor-critic algorithm that combines off-policy updates with maximum entropy reinforcement learning, achieving state-of-the-art results and enhanced stability.

Findings

01

Outperforms prior methods on continuous control benchmarks

02

Achieves high stability across different random seeds

03

Demonstrates improved sample efficiency and convergence

Abstract

Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major challenges: very high sample complexity and brittle convergence properties, which necessitate meticulous hyperparameter tuning. Both of these challenges severely limit the applicability of such methods to complex, real-world domains. In this paper, we propose soft actor-critic, an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework. In this framework, the actor aims to maximize expected reward while also maximizing entropy. That is, to succeed at the task while acting as randomly as possible. Prior deep RL methods based on this framework have been formulated as Q-learning methods. By combining off-policy updates with a stable stochastic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Neural dynamics and brain function · Advanced Memory and Neural Computing

MethodsExperience Replay · Dense Connections · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Soft Actor Critic