Context Shift Reduction for Offline Meta-Reinforcement Learning
Yunkai Gao, Rui Zhang, Jiaming Guo, Fan Wu, Qi Yi, Shaohui Peng,, Siming Lan, Ruizhi Chen, Zidong Du, Xing Hu, Qi Guo, Ling Li, Yunji Chen

TL;DR
This paper introduces CSRO, a novel method for offline meta-reinforcement learning that effectively reduces context shift caused by distribution discrepancies, thereby enhancing generalization to unseen tasks using only offline datasets.
Contribution
The paper proposes a new approach called CSRO that minimizes policy influence on context representations during training and testing, addressing the context shift problem in OMRL.
Findings
CSRO significantly reduces context shift in OMRL.
CSRO improves generalization performance across various domains.
Experimental results outperform previous methods in challenging tasks.
Abstract
Offline meta-reinforcement learning (OMRL) utilizes pre-collected offline datasets to enhance the agent's generalization ability on unseen tasks. However, the context shift problem arises due to the distribution discrepancy between the contexts used for training (from the behavior policy) and testing (from the exploration policy). The context shift problem leads to incorrect task inference and further deteriorates the generalization ability of the meta-policy. Existing OMRL methods either overlook this problem or attempt to mitigate it with additional information. In this paper, we propose a novel approach called Context Shift Reduction for OMRL (CSRO) to address the context shift problem with only offline datasets. The key insight of CSRO is to minimize the influence of policy in context during both the meta-training and meta-test phases. During meta-training, we design a max-min…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning
