TL;DR
This paper introduces Task-Aware Virtual Training (TAVT), a novel meta-reinforcement learning algorithm that improves out-of-distribution task generalization by capturing task characteristics and regularizing states.
Contribution
TAVT is a new algorithm that enhances OOD generalization in meta-RL through metric-based task representation and state regularization techniques.
Findings
TAVT significantly improves OOD task performance in MuJoCo and MetaWorld environments.
The method effectively preserves task characteristics in virtual training scenarios.
State regularization reduces overestimation errors in dynamic environments.
Abstract
Meta reinforcement learning aims to develop policies that generalize to unseen tasks sampled from a task distribution. While context-based meta-RL methods improve task representation using task latents, they often struggle with out-of-distribution (OOD) tasks. To address this, we propose Task-Aware Virtual Training (TAVT), a novel algorithm that accurately captures task characteristics for both training and OOD scenarios using metric-based representation learning. Our method successfully preserves task characteristics in virtual tasks and employs a state regularization technique to mitigate overestimation errors in state-varying environments. Numerical results demonstrate that TAVT significantly enhances generalization to OOD tasks across various MuJoCo and MetaWorld environments. Our code is available at https://github.com/JM-Kim-94/tavt.git.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Advanced Bandit Algorithms Research
