Robust Asymmetric Learning in POMDPs

Andrew Warrington; J. Wilder Lavington; Adam \'Scibior; Mark; Schmidt; Frank Wood

arXiv:2012.15566·cs.LG·July 2, 2021·6 cites

Robust Asymmetric Learning in POMDPs

Andrew Warrington, J. Wilder Lavington, Adam \'Scibior, Mark, Schmidt, Frank Wood

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a new method called adaptive asymmetric DAgger (A2D) for training policies in POMDPs, addressing the flaw of existing imitation approaches by jointly training an expert and agent to maximize expected reward under partial observability.

Contribution

The paper proposes a novel objective and algorithm (A2D) that trains an expert to maximize the agent’s expected reward, improving imitation safety and performance in POMDPs.

Findings

01

A2D produces expert policies that are safe for imitation.

02

A2D outperforms fixed expert imitation in POMDPs.

03

The method effectively handles partial observability in policy learning.

Abstract

Policies for partially observed Markov decision processes can be efficiently learned by imitating policies for the corresponding fully observed Markov decision processes. Unfortunately, existing approaches for this kind of imitation learning have a serious flaw: the expert does not know what the trainee cannot see, and so may encourage actions that are sub-optimal, even unsafe, under partial information. We derive an objective to instead train the expert to maximize the expected reward of the imitating agent policy, and use it to construct an efficient algorithm, adaptive asymmetric DAgger (A2D), that jointly trains the expert and the agent. We show that A2D produces an expert policy that the agent can safely imitate, in turn outperforming policies learned by imitating a fixed expert.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

plai-group/a2d
pytorchOfficial

Videos

Robust Asymmetric Learning in POMDPs· slideslive

Taxonomy

TopicsMachine Learning and ELM · Machine Learning and Algorithms · Neural Networks and Applications