Bayesian Soft Actor-Critic: A Directed Acyclic Strategy Graph Based Deep   Reinforcement Learning

Qin Yang; Ramviyas Parasuraman

arXiv:2208.06033·cs.AI·December 6, 2023

Bayesian Soft Actor-Critic: A Directed Acyclic Strategy Graph Based Deep Reinforcement Learning

Qin Yang, Ramviyas Parasuraman

PDF

Open Access 2 Repos

TL;DR

This paper introduces Bayesian Soft Actor-Critic (BSAC), a novel deep reinforcement learning approach that decomposes complex policies into simpler sub-policies using Bayesian strategy networks, leading to improved training efficiency.

Contribution

It proposes a new directed acyclic strategy graph decomposition method integrated with SAC, forming BSAC, which enhances policy learning in continuous control tasks.

Findings

01

BSAC outperforms existing DRL algorithms in training efficiency.

02

The approach effectively decomposes complex policies into manageable sub-policies.

03

Experimental results on OpenAI Gym benchmarks validate the method's advantages.

Abstract

Adopting reasonable strategies is challenging but crucial for an intelligent agent with limited resources working in hazardous, unstructured, and dynamic environments to improve the system's utility, decrease the overall cost, and increase mission success probability. This paper proposes a novel directed acyclic strategy graph decomposition approach based on Bayesian chaining to separate an intricate policy into several simple sub-policies and organize their relationships as Bayesian strategy networks (BSN). We integrate this approach into the state-of-the-art DRL method -- soft actor-critic (SAC), and build the corresponding Bayesian soft actor-critic (BSAC) model by organizing several sub-policies as a joint policy. We compare our method against the state-of-the-art deep reinforcement learning algorithms on the standard continuous control benchmarks in the OpenAI Gym environment. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Advanced Software Engineering Methodologies

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Clipped Double Q-learning · Global Average Pooling · Target Policy Smoothing · Dilated Convolution · 1x1 Convolution · Average Pooling · Switchable Atrous Convolution · Twin Delayed Deep Deterministic · Entropy Regularization