A Framework for Transforming Specifications in Reinforcement Learning

Rajeev Alur; Suguman Bansal; Osbert Bastani; Kishor Jothimurugan

arXiv:2111.00272·cs.FL·May 31, 2022

A Framework for Transforming Specifications in Reinforcement Learning

Rajeev Alur, Suguman Bansal, Osbert Bastani, Kishor Jothimurugan

PDF

Open Access

TL;DR

This paper introduces a formal framework for transforming reinforcement learning tasks with different objectives, analyzing the theoretical guarantees of such transformations, and identifying limitations in existing approaches for high-level specifications.

Contribution

It develops a formal framework for task transformations in RL, formalizes properties like policy preservation and robustness, and explores the limitations of reductions from temporal logic to reward-based objectives.

Findings

01

Certain reductions from LTL to reward-based objectives do not exist.

02

No RL algorithms with PAC-MDP guarantees for safety specifications.

03

Restates and extends known results on task transformations in RL.

Abstract

Reactive synthesis algorithms allow automatic construction of policies to control an environment modeled as a Markov Decision Process (MDP) that are optimal with respect to high-level temporal logic specifications. However, they assume that the MDP model is known a priori. Reinforcement Learning (RL) algorithms, in contrast, are designed to learn an optimal policy when the transition probabilities of the MDP are unknown, but require the user to associate local rewards with transitions. The appeal of high-level temporal logic specifications has motivated research to develop RL algorithms for synthesis of policies from specifications. To understand the techniques, and nuanced variations in their theoretical guarantees, in the growing body of resulting literature, we develop a formal framework for defining transformations among RL tasks with different forms of objectives. We define the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFormal Methods in Verification · Safety Systems Engineering in Autonomy · Software Reliability and Analysis Research