Finite Horizon Multi-Agent Reinforcement Learning in Solving Optimal Control of State-Dependent Switched Systems
Mi Zhou, Jiazhi Li, Masood Mortazavi, Ning Yan, and Chaouki Abdallah

TL;DR
This paper introduces SMADDPG, a multi-agent reinforcement learning method designed for optimal control of state-dependent switched systems, demonstrating superior performance over traditional DDPG in simplified canonical examples.
Contribution
The paper proposes a novel multi-agent deep deterministic policy gradient method tailored for regionally switched systems, with theoretical insights and empirical validation.
Findings
SMADDPG outperforms vanilla DDPG in tested environments.
The method effectively learns optimal control policies for switched systems.
Theoretical analysis supports the method's performance improvements.
Abstract
In this article, a \underline{S}tate-dependent \underline{M}ulti-\underline{A}gent \underline{D}eep \underline{D}eterministic \underline{P}olicy \underline{G}radient (\textbf{SMADDPG}) method is proposed in order to learn an optimal control policy for regionally switched systems. We observe good performance of this method and explain it in a rigorous mathematical language using some simplifying assumptions in order to motivate the ideas and to apply them to some canonical examples. Using reinforcement learning, the performance of the switched learning-based multi-agent method is compared with the vanilla DDPG in two customized demonstrative environments with one and two-dimensional state spaces.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFuel Cells and Related Materials · Electric and Hybrid Vehicle Technologies · Reinforcement Learning in Robotics
MethodsDense Connections · Experience Replay · *Communicated@Fast*How Do I Communicate to Expedia? · Convolution · Batch Normalization · Adam · Weight Decay · Deep Deterministic Policy Gradient
