Inferring Reward Machines and Transition Machines from Partially Observable Markov Decision Processes

Yuly Wu; Jiamou Liu; Libo Zhang

arXiv:2508.01947·cs.LG·August 5, 2025

Inferring Reward Machines and Transition Machines from Partially Observable Markov Decision Processes

Yuly Wu, Jiamou Liu, Libo Zhang

PDF

Open Access

TL;DR

This paper introduces a unified automaton inference framework combining Reward Machines and Transition Machines for POMDPs, significantly improving inference efficiency and handling non-Markovian observations in reinforcement learning.

Contribution

It proposes the Dual Behavior Mealy Machine and DB-RPNI algorithm, enabling efficient, unified inference of automata for partially observable environments, addressing previous computational limitations.

Findings

01

Achieves up to 1000x speedup over state-of-the-art methods

02

Successfully infers minimal correct automata in experiments

03

Handles reward-based and transition-based non-Markovianity effectively

Abstract

Partially Observable Markov Decision Processes (POMDPs) are fundamental to many real-world applications. Although reinforcement learning (RL) has shown success in fully observable domains, learning policies from traces in partially observable environments remains challenging due to non-Markovian observations. Inferring an automaton to handle the non-Markovianity is a proven effective approach, but faces two limitations: 1) existing automaton representations focus only on reward-based non-Markovianity, leading to unnatural problem formulations; 2) inference algorithms face enormous computational costs. For the first limitation, we introduce Transition Machines (TMs) to complement existing Reward Machines (RMs). To develop a unified inference algorithm for both automata types, we propose the Dual Behavior Mealy Machine (DBMM) that subsumes both TMs and RMs. We then introduce DB-RPNI, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Formal Methods in Verification