Gradient-Descent for Randomized Controllers under Partial Observability

Linus Heck; Jip Spel; Sebastian Junges; Joshua Moerman and; Joost-Pieter Katoen

arXiv:2111.04407·cs.LO·November 9, 2021

Gradient-Descent for Randomized Controllers under Partial Observability

Linus Heck, Jip Spel, Sebastian Junges, Joshua Moerman and, Joost-Pieter Katoen

PDF

Open Access

TL;DR

This paper introduces a gradient-based approach for optimizing randomized controllers in partially observable systems, leveraging synthesis algorithms for parametric Markov chains to improve scalability and performance.

Contribution

It defines and evaluates gradients of parametric Markov chains and applies machine learning gradient descent techniques to synthesize controller probabilities.

Findings

01

Scales to larger pMCs than previous methods

02

Empirically outperforms state-of-the-art by at least an order of magnitude

03

Provides a new gradient-based framework for controller synthesis

Abstract

Randomization is a powerful technique to create robust controllers, in particular in partially observable settings. The degrees of randomization have a significant impact on the system performance, yet they are intricate to get right. The use of synthesis algorithms for parametric Markov chains (pMCs) is a promising direction to support the design process of such controllers. This paper shows how to define and evaluate gradients of pMCs. Furthermore, it investigates varieties of gradient descent techniques from the machine learning community to synthesize the probabilities in a pMC. The resulting method scales to significantly larger pMCs than before and empirically outperforms the state-of-the-art, often by at least one order of magnitude.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Markov Chains and Monte Carlo Methods · Reinforcement Learning in Robotics