Maximally Permissive Reward Machines
Giovanni Varricchione, Natasha Alechina, Mehdi Dastani, Brian Logan

TL;DR
This paper introduces a novel method for synthesizing reward machines from multiple partial order plans, enabling more flexible learning and higher rewards in temporally extended tasks.
Contribution
It proposes a maximally permissive reward machine synthesis approach based on partial order plans, improving reward outcomes over single-plan methods.
Findings
Higher rewards achieved with maximally permissive reward machines
Experimental results support theoretical advantages
Outperforms single-plan reward machine approaches
Abstract
Reward machines allow the definition of rewards for temporally extended tasks and behaviors. Specifying "informative" reward machines can be challenging. One way to address this is to generate reward machines from a high-level abstract description of the learning environment, using techniques such as AI planning. However, previous planning-based approaches generate a reward machine based on a single (sequential or partial-order) plan, and do not allow maximum flexibility to the learning agent. In this paper we propose a new approach to synthesising reward machines which is based on the set of partial order plans for a goal. We prove that learning using such "maximally permissive" reward machines results in higher rewards than learning using RMs based on a single plan. We present experimental results which support our theoretical claims by showing that our approach obtains higher rewards…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReceptor Mechanisms and Signaling
MethodsSparse Evolutionary Training
