Active Reward Machine Inference From Raw State Trajectories
Mohamad Louai Shehab, Antoine Aspeel, Necmiye Ozay

TL;DR
This paper introduces a method to learn reward machines directly from raw state trajectories without prior reward or label information, using active learning to enhance efficiency, demonstrated on grid world examples.
Contribution
It presents a novel approach for inferring reward machines from minimal data, extending to active learning for improved data and computational efficiency.
Findings
Successfully learned reward machines from raw trajectories without reward labels.
Active learning improved data efficiency in reward machine inference.
Validated approach on multiple grid world scenarios.
Abstract
Reward machines are automaton-like structures that capture the memory required to accomplish a multi-stage task. When combined with reinforcement learning or optimal control methods, they can be used to synthesize robot policies to achieve such tasks. However, specifying a reward machine by hand, including a labeling function capturing high-level features that the decisions are based on, can be a daunting task. This paper deals with the problem of learning reward machines directly from raw state and policy information. As opposed to existing works, we assume no access to observations of rewards, labels, or machine nodes, and show what trajectory data is sufficient for learning the reward machine in this information-scarce regime. We then extend the result to an active learning setting where we incrementally query trajectory extensions to improve data (and indirectly computational)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
