Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine

TL;DR
The paper introduces the soft actor-critic algorithm, an off-policy deep reinforcement learning method based on maximum entropy principles, which improves sample efficiency, stability, and performance on continuous control tasks.
Contribution
It presents a novel stochastic actor-critic algorithm that combines off-policy updates with maximum entropy reinforcement learning, achieving state-of-the-art results and enhanced stability.
Findings
Outperforms prior methods on continuous control benchmarks
Achieves high stability across different random seeds
Demonstrates improved sample efficiency and convergence
Abstract
Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major challenges: very high sample complexity and brittle convergence properties, which necessitate meticulous hyperparameter tuning. Both of these challenges severely limit the applicability of such methods to complex, real-world domains. In this paper, we propose soft actor-critic, an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework. In this framework, the actor aims to maximize expected reward while also maximizing entropy. That is, to succeed at the task while acting as randomly as possible. Prior deep RL methods based on this framework have been formulated as Q-learning methods. By combining off-policy updates with a stable stochastic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Neural dynamics and brain function · Advanced Memory and Neural Computing
MethodsExperience Replay · Dense Connections · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Soft Actor Critic
