Soft Actor-Critic Algorithms and Applications

Tuomas Haarnoja; Aurick Zhou; Kristian Hartikainen; George Tucker,; Sehoon Ha; Jie Tan; Vikash Kumar; Henry Zhu; Abhishek Gupta; Pieter Abbeel; and Sergey Levine

arXiv:1812.05905·cs.LG·September 16, 2019·1.9k cites

Soft Actor-Critic Algorithms and Applications

Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker,, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, and Sergey Levine

PDF

Open Access 5 Repos

TL;DR

The paper introduces Soft Actor-Critic (SAC), an off-policy deep reinforcement learning algorithm based on maximum entropy principles, which improves training stability, sample efficiency, and robustness, making it suitable for real-world robotics applications.

Contribution

The paper extends SAC with modifications for faster training and hyperparameter stability, including automatic temperature tuning, demonstrating state-of-the-art performance on benchmarks and real-world tasks.

Findings

01

SAC outperforms prior methods in sample efficiency and asymptotic performance.

02

SAC demonstrates high stability across different random seeds.

03

SAC is effective in real-world robotics tasks like locomotion and manipulation.

Abstract

Model-free deep reinforcement learning (RL) algorithms have been successfully applied to a range of challenging sequential decision making and control tasks. However, these methods typically suffer from two major challenges: high sample complexity and brittleness to hyperparameters. Both of these challenges limit the applicability of such methods to real-world domains. In this paper, we describe Soft Actor-Critic (SAC), our recently introduced off-policy actor-critic algorithm based on the maximum entropy RL framework. In this framework, the actor aims to simultaneously maximize expected return and entropy. That is, to succeed at the task while acting as randomly as possible. We extend SAC to incorporate a number of modifications that accelerate training and improve stability with respect to the hyperparameters, including a constrained formulation that automatically tunes the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Evolutionary Algorithms and Applications

MethodsExperience Replay · Dense Connections · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Soft Actor-Critic (Autotuned Temperature)