Gradient-Descent for Randomized Controllers under Partial Observability
Linus Heck, Jip Spel, Sebastian Junges, Joshua Moerman and, Joost-Pieter Katoen

TL;DR
This paper introduces a gradient-based approach for optimizing randomized controllers in partially observable systems, leveraging synthesis algorithms for parametric Markov chains to improve scalability and performance.
Contribution
It defines and evaluates gradients of parametric Markov chains and applies machine learning gradient descent techniques to synthesize controller probabilities.
Findings
Scales to larger pMCs than previous methods
Empirically outperforms state-of-the-art by at least an order of magnitude
Provides a new gradient-based framework for controller synthesis
Abstract
Randomization is a powerful technique to create robust controllers, in particular in partially observable settings. The degrees of randomization have a significant impact on the system performance, yet they are intricate to get right. The use of synthesis algorithms for parametric Markov chains (pMCs) is a promising direction to support the design process of such controllers. This paper shows how to define and evaluate gradients of pMCs. Furthermore, it investigates varieties of gradient descent techniques from the machine learning community to synthesize the probabilities in a pMC. The resulting method scales to significantly larger pMCs than before and empirically outperforms the state-of-the-art, often by at least one order of magnitude.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Markov Chains and Monte Carlo Methods · Reinforcement Learning in Robotics
