Maximally Permissive Reward Machines

Giovanni Varricchione; Natasha Alechina; Mehdi Dastani; Brian Logan

arXiv:2408.08059·cs.LG·August 16, 2024

Maximally Permissive Reward Machines

Giovanni Varricchione, Natasha Alechina, Mehdi Dastani, Brian Logan

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel method for synthesizing reward machines from multiple partial order plans, enabling more flexible learning and higher rewards in temporally extended tasks.

Contribution

It proposes a maximally permissive reward machine synthesis approach based on partial order plans, improving reward outcomes over single-plan methods.

Findings

01

Higher rewards achieved with maximally permissive reward machines

02

Experimental results support theoretical advantages

03

Outperforms single-plan reward machine approaches

Abstract

Reward machines allow the definition of rewards for temporally extended tasks and behaviors. Specifying "informative" reward machines can be challenging. One way to address this is to generate reward machines from a high-level abstract description of the learning environment, using techniques such as AI planning. However, previous planning-based approaches generate a reward machine based on a single (sequential or partial-order) plan, and do not allow maximum flexibility to the learning agent. In this paper we propose a new approach to synthesising reward machines which is based on the set of partial order plans for a goal. We prove that learning using such "maximally permissive" reward machines results in higher rewards than learning using RMs based on a single plan. We present experimental results which support our theoretical claims by showing that our approach obtains higher rewards…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

giovannivarr/mprm-ecai24
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReceptor Mechanisms and Signaling

MethodsSparse Evolutionary Training