Model-free Reinforcement Learning for Branching Markov Decision   Processes

Ernst Moritz Hahn; Mateo Perez; Sven Schewe; Fabio Somenzi; Ashutosh; Trivedi; Dominik Wojtczak

arXiv:2106.06777·cs.LG·June 15, 2021

Model-free Reinforcement Learning for Branching Markov Decision Processes

Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh, Trivedi, Dominik Wojtczak

PDF

TL;DR

This paper extends model-free reinforcement learning to optimize control strategies in Branching Markov Decision Processes, enabling decision-making in complex stochastic systems with multiple entity types.

Contribution

It introduces a novel approach to apply model-free RL to BMDPs, generalizing existing techniques and demonstrating practical implementation.

Findings

01

Successful implementation of the RL approach for BMDPs

02

Demonstrated the practicality of the method in complex systems

03

Extended RL techniques to a new class of stochastic processes

Abstract

We study reinforcement learning for the optimal control of Branching Markov Decision Processes (BMDPs), a natural extension of (multitype) Branching Markov Chains (BMCs). The state of a (discrete-time) BMCs is a collection of entities of various types that, while spawning other entities, generate a payoff. In comparison with BMCs, where the evolution of a each entity of the same type follows the same probabilistic pattern, BMDPs allow an external controller to pick from a range of options. This permits us to study the best/worst behaviour of the system. We generalise model-free reinforcement learning techniques to compute an optimal control strategy of an unknown BMDP in the limit. We present results of an implementation that demonstrate the practicality of the approach.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.