Generalizing soft actor-critic algorithms to discrete action spaces

Le Zhang; Yong Gu; Xin Zhao; Yanshuo Zhang; Shu Zhao; Yifei Jin,; Xinxin Wu

arXiv:2407.11044·cs.LG·July 17, 2024

Generalizing soft actor-critic algorithms to discrete action spaces

Le Zhang, Yong Gu, Xin Zhao, Yanshuo Zhang, Shu Zhao, Yifei Jin,, Xinxin Wu

PDF

Open Access

TL;DR

This paper introduces a discrete variant of the soft actor-critic algorithm, enabling off-policy learning in discrete action spaces, and demonstrates its superior performance on Atari games with reduced training time.

Contribution

It proposes a practical discrete soft actor-critic algorithm integrated into Rainbow, achieving state-of-the-art results with lower replay ratios and faster training.

Findings

01

SAC-BBF improves IQM from 1.045 to 1.088 over previous methods.

02

SAC-BBF trains in one-third the time of prior algorithms to reach similar performance.

03

SAC-BBF achieves super-human performance with low replay ratio 2.

Abstract

ATARI is a suite of video games used by reinforcement learning (RL) researchers to test the effectiveness of the learning algorithm. Receiving only the raw pixels and the game score, the agent learns to develop sophisticated strategies, even to the comparable level of a professional human games tester. Ideally, we also want an agent requiring very few interactions with the environment. Previous competitive model-free algorithms for the task use the valued-based Rainbow algorithm without any policy head. In this paper, we change it by proposing a practical discrete variant of the soft actor-critic (SAC) algorithm. The new variant enables off-policy learning using policy heads for discrete domains. By incorporating it into the advanced Rainbow variant, i.e., the ``bigger, better, faster'' (BBF), the resulting SAC-BBF improves the previous state-of-the-art interquartile mean (IQM) from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Control Systems Optimization · Reinforcement Learning in Robotics