Pushdown Reward Machines for Reinforcement Learning

Giovanni Varricchione; Toryn Q. Klassen; Natasha Alechina; Mehdi Dastani; Brian Logan; Sheila A. McIlraith

arXiv:2508.06894·cs.AI·November 13, 2025

Pushdown Reward Machines for Reinforcement Learning

Giovanni Varricchione, Toryn Q. Klassen, Natasha Alechina, Mehdi Dastani, Brian Logan, Sheila A. McIlraith

PDF

Open Access

TL;DR

This paper introduces pushdown reward machines (pdRMs), an extension of reward machines using pushdown automata, enabling the encoding of more complex, context-free language-based behaviors in reinforcement learning, with theoretical and experimental validation.

Contribution

The work extends reward machines to pushdown automata, increasing expressiveness for representing complex behaviors in reinforcement learning, and provides theoretical analysis and practical algorithms.

Findings

01

pdRMs are more expressive than reward machines.

02

Theoretical bounds on policy equivalence with limited stack access.

03

Experimental results demonstrate successful training on context-free language tasks.

Abstract

Reward machines (RMs) are automata structures that encode (non-Markovian) reward functions for reinforcement learning (RL). RMs can reward any behaviour representable in regular languages and, when paired with RL algorithms that exploit RM structure, have been shown to significantly improve sample efficiency in many domains. In this work, we present pushdown reward machines (pdRMs), an extension of reward machines based on deterministic pushdown automata. pdRMs can recognise and reward temporally extended behaviours representable in deterministic context-free languages, making them more expressive than reward machines. We introduce two variants of pdRM-based policies, one which has access to the entire stack of the pdRM, and one which can only access the top $k$ symbols (for a given constant $k$ ) of the stack. We propose a procedure to check when the two kinds of policies (for a given…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Formal Methods in Verification · Reinforcement Learning in Robotics