Learning Task Automata for Reinforcement Learning using Hidden Markov Models
Alessandro Abate (1), Yousif Almulla (2), James Fox (1), David Hyland, (1), Michael Wooldridge (1) ((1) University of Oxford, (2) Microsoft Azure, Quantum)

TL;DR
This paper introduces a method to learn task automata from agent experiences in unknown environments, improving reinforcement learning efficiency and interpretability by modeling non-Markovian rewards with hidden Markov models.
Contribution
The paper presents a novel pipeline that learns deterministic finite automata representing tasks from episodes, using hidden Markov models and product MDPs, enhancing transferability and interpretability.
Findings
Effective learning of task automata from limited episodes
Improved policy synthesis speed due to task decomposition
Automata are environment-agnostic and suitable for transfer learning
Abstract
Training reinforcement learning (RL) agents using scalar reward signals is often infeasible when an environment has sparse and non-Markovian rewards. Moreover, handcrafting these reward functions before training is prone to misspecification, especially when the environment's dynamics are only partially known. This paper proposes a novel pipeline for learning non-Markovian task specifications as succinct finite-state `task automata' from episodes of agent experience within unknown environments. We leverage two key algorithmic insights. First, we learn a product MDP, a model composed of the specification's automaton and the environment's MDP (both initially unknown), by treating the product MDP as a partially observable MDP and using the well-known Baum-Welch algorithm for learning hidden Markov models. Second, we propose a novel method for distilling the task automaton (assumed to be a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Optimization and Search Problems · Machine Learning and Algorithms
