Bayesian Soft Actor-Critic: A Directed Acyclic Strategy Graph Based Deep Reinforcement Learning
Qin Yang, Ramviyas Parasuraman

TL;DR
This paper introduces Bayesian Soft Actor-Critic (BSAC), a novel deep reinforcement learning approach that decomposes complex policies into simpler sub-policies using Bayesian strategy networks, leading to improved training efficiency.
Contribution
It proposes a new directed acyclic strategy graph decomposition method integrated with SAC, forming BSAC, which enhances policy learning in continuous control tasks.
Findings
BSAC outperforms existing DRL algorithms in training efficiency.
The approach effectively decomposes complex policies into manageable sub-policies.
Experimental results on OpenAI Gym benchmarks validate the method's advantages.
Abstract
Adopting reasonable strategies is challenging but crucial for an intelligent agent with limited resources working in hazardous, unstructured, and dynamic environments to improve the system's utility, decrease the overall cost, and increase mission success probability. This paper proposes a novel directed acyclic strategy graph decomposition approach based on Bayesian chaining to separate an intricate policy into several simple sub-policies and organize their relationships as Bayesian strategy networks (BSN). We integrate this approach into the state-of-the-art DRL method -- soft actor-critic (SAC), and build the corresponding Bayesian soft actor-critic (BSAC) model by organizing several sub-policies as a joint policy. We compare our method against the state-of-the-art deep reinforcement learning algorithms on the standard continuous control benchmarks in the OpenAI Gym environment. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Advanced Software Engineering Methodologies
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Clipped Double Q-learning · Global Average Pooling · Target Policy Smoothing · Dilated Convolution · 1x1 Convolution · Average Pooling · Switchable Atrous Convolution · Twin Delayed Deep Deterministic · Entropy Regularization
