Learning without Knowing: Unobserved Context in Continuous Transfer Reinforcement Learning
Chenyu Liu, Yan Zhang, Yi Shen, Michael M. Zavlanos

TL;DR
This paper introduces a novel transfer reinforcement learning approach that handles unobserved context by formulating the problem as a causal bound-constrained multi-armed bandit, enabling faster and more stable policy learning.
Contribution
It proposes a new method combining causal bounds and bandit algorithms to learn context-unaware policies from expert data with unobserved context, reducing exploration variance.
Findings
Faster policy improvement compared to existing imitation methods
Lower variance during training
Effective handling of unobserved contextual information
Abstract
In this paper, we consider a transfer Reinforcement Learning (RL) problem in continuous state and action spaces, under unobserved contextual information. For example, the context can represent the mental view of the world that an expert agent has formed through past interactions with this world. We assume that this context is not accessible to a learner agent who can only observe the expert data. Then, our goal is to use the context-aware expert data to learn an optimal context-unaware policy for the learner using only a few new data samples. Such problems are typically solved using imitation learning that assumes that both the expert and learner agents have access to the same information. However, if the learner does not know the expert context, using the expert data alone will result in a biased learner policy and will require many new data samples to improve. To address this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Advanced Bandit Algorithms Research
