Entropy Regularized Task Representation Learning for Offline Meta-Reinforcement Learning
Mohammadreza Nakhaei, Aidan Scannell, Joni Pajarinen

TL;DR
This paper introduces an entropy regularization method for offline meta-reinforcement learning that improves task representation quality and enhances generalization to new tasks by reducing overfitting to offline data.
Contribution
The paper proposes a novel entropy regularization approach that minimizes mutual information between task representations and behavior policy, addressing distribution mismatch in offline meta-RL.
Findings
Task representations better capture underlying tasks.
Improved performance on in-distribution tasks.
Enhanced generalization to out-of-distribution tasks.
Abstract
Offline meta-reinforcement learning aims to equip agents with the ability to rapidly adapt to new tasks by training on data from a set of different tasks. Context-based approaches utilize a history of state-action-reward transitions -- referred to as the context -- to infer representations of the current task, and then condition the agent, i.e., the policy and value function, on the task representations. Intuitively, the better the task representations capture the underlying tasks, the better the agent can generalize to new tasks. Unfortunately, context-based approaches suffer from distribution mismatch, as the context in the offline data does not match the context at test time, limiting their ability to generalize to the test tasks. This leads to the task representations overfitting to the offline training data. Intuitively, the task representations should be independent of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control
MethodsSparse Evolutionary Training
