Target Entropy Annealing for Discrete Soft Actor-Critic

Yaosheng Xu; Dailin Hu; Litian Liang; Stephen McAleer and; Pieter Abbeel; Roy Fox

arXiv:2112.02852·cs.LG·December 7, 2021

Target Entropy Annealing for Discrete Soft Actor-Critic

Yaosheng Xu, Dailin Hu, Litian Liang, Stephen McAleer and, Pieter Abbeel, Roy Fox

PDF

Open Access

TL;DR

This paper introduces TES-SAC, an annealing method for target entropy in discrete Soft Actor-Critic, improving its performance in discrete domains like Atari games by scheduling the target entropy parameter.

Contribution

The paper proposes Target Entropy Scheduled SAC (TES-SAC), a novel annealing approach for the target entropy parameter in discrete SAC, addressing its poor performance in discrete environments.

Findings

01

TES-SAC improves performance on Atari 2600 games.

02

Scheduled target entropy enhances SAC stability in discrete domains.

03

Analysis shows scheduling affects policy entropy and learning efficiency.

Abstract

Soft Actor-Critic (SAC) is considered the state-of-the-art algorithm in continuous action space settings. It uses the maximum entropy framework for efficiency and stability, and applies a heuristic temperature Lagrange term to tune the temperature $α$ , which determines how "soft" the policy should be. It is counter-intuitive that empirical evidence shows SAC does not perform well in discrete domains. In this paper we investigate the possible explanations for this phenomenon and propose Target Entropy Scheduled SAC (TES-SAC), an annealing method for the target entropy parameter applied on SAC. Target entropy is a constant in the temperature Lagrange term and represents the target policy entropy in discrete SAC. We compare our method on Atari 2600 games with different constant target entropy SAC, and analyze on how our scheduling affects SAC.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games · Reinforcement Learning in Robotics · Digital Games and Media

MethodsDilated Convolution · Average Pooling · 1x1 Convolution · Global Average Pooling · Convolution · Switchable Atrous Convolution