A reinforcement learning approach to hybrid control design

Meet Gandhi; Atreyee Kundu; Shalabh Bhatnagar

arXiv:2009.00821·eess.SY·September 3, 2020

A reinforcement learning approach to hybrid control design

Meet Gandhi, Atreyee Kundu, Shalabh Bhatnagar

PDF

Open Access

TL;DR

This paper introduces a reinforcement learning framework for designing hybrid control policies in systems with unknown models, utilizing a single MDP formulation and adapting PPO for hybrid actions, achieving convergence to optimal policies.

Contribution

The paper presents a novel MDP-based framework for hybrid control design and adapts PPO to hybrid action spaces, enabling model-free optimal control policy learning.

Findings

01

PPO converges to optimal policies in hybrid control problems.

02

The MDP framework simplifies hybrid control design.

03

The approach is applicable to benchmark hybrid systems.

Abstract

In this paper we design hybrid control policies for hybrid systems whose mathematical models are unknown. Our contributions are threefold. First, we propose a framework for modelling the hybrid control design problem as a single Markov Decision Process (MDP). This result facilitates the application of off-the-shelf algorithms from Reinforcement Learning (RL) literature towards designing optimal control policies. Second, we model a set of benchmark examples of hybrid control design problem in the proposed MDP framework. Third, we adapt the recently proposed Proximal Policy Optimisation (PPO) algorithm for the hybrid action space and apply it to the above set of problems. It is observed that in each case the algorithm converges and finds the optimal policy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Electric Vehicles and Infrastructure · Adaptive Dynamic Programming Control