Reinforcement Learning in POMDPs with Memoryless Options and   Option-Observation Initiation Sets

Denis Steckelmacher; Diederik M. Roijers; Anna Harutyunyan; Peter; Vrancx; H\'el\`ene Plisnier; Ann Now\'e

arXiv:1708.06551·cs.AI·September 13, 2017

Reinforcement Learning in POMDPs with Memoryless Options and Option-Observation Initiation Sets

Denis Steckelmacher, Diederik M. Roijers, Anna Harutyunyan, Peter, Vrancx, H\'el\`ene Plisnier, Ann Now\'e

PDF

TL;DR

This paper introduces Option-Observation Initiation Sets (OOIs), a hierarchical approach that simplifies learning in partially observable environments by making options' initiation dependent on previous options, achieving efficiency and expressiveness.

Contribution

The paper proposes OOIs, enabling memoryless options conditioned on previous options, which are easier to design, more interpretable, and more sample-efficient than recurrent methods in POMDPs.

Findings

01

OOIs are as expressive as Finite State Controllers in POMDPs.

02

Agents with OOIs learn optimal policies more efficiently than recurrent neural networks.

03

OOIs lead to explainable policies with memoryless top-level and option policies.

Abstract

Many real-world reinforcement learning problems have a hierarchical nature, and often exhibit some degree of partial observability. While hierarchy and partial observability are usually tackled separately (for instance by combining recurrent neural networks and options), we show that addressing both problems simultaneously is simpler and more efficient in many cases. More specifically, we make the initiation set of options conditional on the previously-executed option, and show that options with such Option-Observation Initiation Sets (OOIs) are at least as expressive as Finite State Controllers (FSCs), a state-of-the-art approach for learning in POMDPs. OOIs are easy to design based on an intuitive description of the task, lead to explainable policies and keep the top-level and option policies memoryless. Our experiments show that OOIs allow agents to learn optimal policies in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.