Inferring Reward Machines and Transition Machines from Partially Observable Markov Decision Processes
Yuly Wu, Jiamou Liu, Libo Zhang

TL;DR
This paper introduces a unified automaton inference framework combining Reward Machines and Transition Machines for POMDPs, significantly improving inference efficiency and handling non-Markovian observations in reinforcement learning.
Contribution
It proposes the Dual Behavior Mealy Machine and DB-RPNI algorithm, enabling efficient, unified inference of automata for partially observable environments, addressing previous computational limitations.
Findings
Achieves up to 1000x speedup over state-of-the-art methods
Successfully infers minimal correct automata in experiments
Handles reward-based and transition-based non-Markovianity effectively
Abstract
Partially Observable Markov Decision Processes (POMDPs) are fundamental to many real-world applications. Although reinforcement learning (RL) has shown success in fully observable domains, learning policies from traces in partially observable environments remains challenging due to non-Markovian observations. Inferring an automaton to handle the non-Markovianity is a proven effective approach, but faces two limitations: 1) existing automaton representations focus only on reward-based non-Markovianity, leading to unnatural problem formulations; 2) inference algorithms face enormous computational costs. For the first limitation, we introduce Transition Machines (TMs) to complement existing Reward Machines (RMs). To develop a unified inference algorithm for both automata types, we propose the Dual Behavior Mealy Machine (DBMM) that subsumes both TMs and RMs. We then introduce DB-RPNI, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Formal Methods in Verification
