Target Entropy Annealing for Discrete Soft Actor-Critic
Yaosheng Xu, Dailin Hu, Litian Liang, Stephen McAleer and, Pieter Abbeel, Roy Fox

TL;DR
This paper introduces TES-SAC, an annealing method for target entropy in discrete Soft Actor-Critic, improving its performance in discrete domains like Atari games by scheduling the target entropy parameter.
Contribution
The paper proposes Target Entropy Scheduled SAC (TES-SAC), a novel annealing approach for the target entropy parameter in discrete SAC, addressing its poor performance in discrete environments.
Findings
TES-SAC improves performance on Atari 2600 games.
Scheduled target entropy enhances SAC stability in discrete domains.
Analysis shows scheduling affects policy entropy and learning efficiency.
Abstract
Soft Actor-Critic (SAC) is considered the state-of-the-art algorithm in continuous action space settings. It uses the maximum entropy framework for efficiency and stability, and applies a heuristic temperature Lagrange term to tune the temperature , which determines how "soft" the policy should be. It is counter-intuitive that empirical evidence shows SAC does not perform well in discrete domains. In this paper we investigate the possible explanations for this phenomenon and propose Target Entropy Scheduled SAC (TES-SAC), an annealing method for the target entropy parameter applied on SAC. Target entropy is a constant in the temperature Lagrange term and represents the target policy entropy in discrete SAC. We compare our method on Atari 2600 games with different constant target entropy SAC, and analyze on how our scheduling affects SAC.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games · Reinforcement Learning in Robotics · Digital Games and Media
MethodsDilated Convolution · Average Pooling · 1x1 Convolution · Global Average Pooling · Convolution · Switchable Atrous Convolution
