Learning without Knowing: Unobserved Context in Continuous Transfer   Reinforcement Learning

Chenyu Liu; Yan Zhang; Yi Shen; Michael M. Zavlanos

arXiv:2106.03833·cs.LG·June 8, 2021·1 cites

Learning without Knowing: Unobserved Context in Continuous Transfer Reinforcement Learning

Chenyu Liu, Yan Zhang, Yi Shen, Michael M. Zavlanos

PDF

Open Access

TL;DR

This paper introduces a novel transfer reinforcement learning approach that handles unobserved context by formulating the problem as a causal bound-constrained multi-armed bandit, enabling faster and more stable policy learning.

Contribution

It proposes a new method combining causal bounds and bandit algorithms to learn context-unaware policies from expert data with unobserved context, reducing exploration variance.

Findings

01

Faster policy improvement compared to existing imitation methods

02

Lower variance during training

03

Effective handling of unobserved contextual information

Abstract

In this paper, we consider a transfer Reinforcement Learning (RL) problem in continuous state and action spaces, under unobserved contextual information. For example, the context can represent the mental view of the world that an expert agent has formed through past interactions with this world. We assume that this context is not accessible to a learner agent who can only observe the expert data. Then, our goal is to use the context-aware expert data to learn an optimal context-unaware policy for the learner using only a few new data samples. Such problems are typically solved using imitation learning that assumes that both the expert and learner agents have access to the same information. However, if the learner does not know the expert context, using the expert data alone will result in a biased learner policy and will require many new data samples to improve. To address this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Advanced Bandit Algorithms Research