Learning Task Automata for Reinforcement Learning using Hidden Markov   Models

Alessandro Abate (1); Yousif Almulla (2); James Fox (1); David Hyland; (1); Michael Wooldridge (1) ((1) University of Oxford; (2) Microsoft Azure; Quantum)

arXiv:2208.11838·cs.LG·October 4, 2023·1 cites

Learning Task Automata for Reinforcement Learning using Hidden Markov Models

Alessandro Abate (1), Yousif Almulla (2), James Fox (1), David Hyland, (1), Michael Wooldridge (1) ((1) University of Oxford, (2) Microsoft Azure, Quantum)

PDF

Open Access

TL;DR

This paper introduces a method to learn task automata from agent experiences in unknown environments, improving reinforcement learning efficiency and interpretability by modeling non-Markovian rewards with hidden Markov models.

Contribution

The paper presents a novel pipeline that learns deterministic finite automata representing tasks from episodes, using hidden Markov models and product MDPs, enhancing transferability and interpretability.

Findings

01

Effective learning of task automata from limited episodes

02

Improved policy synthesis speed due to task decomposition

03

Automata are environment-agnostic and suitable for transfer learning

Abstract

Training reinforcement learning (RL) agents using scalar reward signals is often infeasible when an environment has sparse and non-Markovian rewards. Moreover, handcrafting these reward functions before training is prone to misspecification, especially when the environment's dynamics are only partially known. This paper proposes a novel pipeline for learning non-Markovian task specifications as succinct finite-state `task automata' from episodes of agent experience within unknown environments. We leverage two key algorithmic insights. First, we learn a product MDP, a model composed of the specification's automaton and the environment's MDP (both initially unknown), by treating the product MDP as a partially observable MDP and using the well-known Baum-Welch algorithm for learning hidden Markov models. Second, we propose a novel method for distilling the task automaton (assumed to be a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Optimization and Search Problems · Machine Learning and Algorithms