FORM: Learning Expressive and Transferable First-Order Logic Reward   Machines

Leo Ardon; Daniel Furelos-Blanco; Roko Parac; Alessandra Russo

arXiv:2501.00364·cs.AI·March 3, 2025

FORM: Learning Expressive and Transferable First-Order Logic Reward Machines

Leo Ardon, Daniel Furelos-Blanco, Roko Parac, Alessandra Russo

PDF

Open Access

TL;DR

This paper introduces First-Order Reward Machines (FORMs), which use first-order logic for more expressive and transferable reward representations in reinforcement learning, enabling scalable learning and multi-agent collaboration.

Contribution

The paper proposes FORM, a novel first-order logic-based reward machine, along with a learning method and multi-agent framework to improve transferability and scalability in RL tasks.

Findings

01

FORMs are more compact and scalable than traditional RMs.

02

Effective learning of FORMs in complex tasks where traditional RMs fail.

03

Multi-agent approach enhances transferability and learning speed.

Abstract

Reward machines (RMs) are an effective approach for addressing non-Markovian rewards in reinforcement learning (RL) through finite-state machines. Traditional RMs, which label edges with propositional logic formulae, inherit the limited expressivity of propositional logic. This limitation hinders the learnability and transferability of RMs since complex tasks will require numerous states and edges. To overcome these challenges, we propose First-Order Reward Machines ( $FORM$ s), which use first-order logic to label edges, resulting in more compact and transferable RMs. We introduce a novel method for $learning$ $FORM$ s and a multi-agent formulation for $exploiting$ them and facilitate their transferability, where multiple agents collaboratively learn policies for a shared $FORM$ . Our experimental results demonstrate the scalability of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFormal Methods in Verification

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings