Soft Actor-Critic With Integer Actions

Ting-Han Fan; Yubo Wang

arXiv:2109.08512·cs.LG·March 15, 2022·1 cites

Soft Actor-Critic With Integer Actions

Ting-Han Fan, Yubo Wang

PDF

Open Access

TL;DR

This paper introduces a novel approach combining Soft Actor-Critic with an integer reparameterization to efficiently handle high-dimensional integer action spaces, outperforming existing methods in certain control tasks.

Contribution

It proposes a low-dimensional integer reparameterization for SAC that leverages the structure of integer actions, improving performance in industrial and robotic control applications.

Findings

01

SAC with integer reparameterization matches continuous SAC in robot tasks.

02

Outperforms PPO in power distribution system control.

03

Reparameterization avoids one-hot encoding, reducing complexity.

Abstract

Reinforcement learning is well-studied under discrete actions. Integer actions setting is popular in the industry yet still challenging due to its high dimensionality. To this end, we study reinforcement learning under integer actions by incorporating the Soft Actor-Critic (SAC) algorithm with an integer reparameterization. Our key observation for integer actions is that their discrete structure can be simplified using their comparability property. Hence, the proposed integer reparameterization does not need one-hot encoding and is of low dimensionality. Experiments show that the proposed SAC under integer actions is as good as the continuous action version on robot control tasks and outperforms Proximal Policy Optimization on power distribution systems control tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Smart Grid Security and Resilience · Adversarial Robustness in Machine Learning

MethodsAverage Pooling · Global Average Pooling · Convolution · Dilated Convolution · 1x1 Convolution · Switchable Atrous Convolution