Soft Decomposed Policy-Critic: Bridging the Gap for Effective Continuous   Control with Discrete RL

Yechen Zhang; Jian Sun; Gang Wang; Zhuo Li; Wei Chen

arXiv:2308.10203·cs.LG·August 22, 2023

Soft Decomposed Policy-Critic: Bridging the Gap for Effective Continuous Control with Discrete RL

Yechen Zhang, Jian Sun, Gang Wang, Zhuo Li, Wei Chen

PDF

Open Access

TL;DR

This paper introduces the SDPC architecture that combines discrete RL techniques with soft actor-critic methods to effectively handle continuous control tasks, overcoming the dimensional explosion problem.

Contribution

The paper proposes a novel SDPC framework that discretizes action spaces independently and integrates soft RL with actor-critic methods for continuous control.

Findings

01

SDPC outperforms state-of-the-art continuous RL algorithms in Mujoco and BipedalWalker tasks.

02

The approach effectively addresses the dimensional explosion in continuous control.

03

Empirical validation shows superior performance of SDPC in various benchmarks.

Abstract

Discrete reinforcement learning (RL) algorithms have demonstrated exceptional performance in solving sequential decision tasks with discrete action spaces, such as Atari games. However, their effectiveness is hindered when applied to continuous control problems due to the challenge of dimensional explosion. In this paper, we present the Soft Decomposed Policy-Critic (SDPC) architecture, which combines soft RL and actor-critic techniques with discrete RL methods to overcome this limitation. SDPC discretizes each action dimension independently and employs a shared critic network to maximize the soft $Q$ -function. This novel approach enables SDPC to support two types of policies: decomposed actors that lead to the Soft Decomposed Actor-Critic (SDAC) algorithm, and decomposed $Q$ -networks that generate Boltzmann soft exploration policies, resulting in the Soft Decomposed-Critic Q (SDCQ)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Smart Grid Security and Resilience