Task-Aware Virtual Training: Enhancing Generalization in Meta-Reinforcement Learning for Out-of-Distribution Tasks

Jeongmo Kim; Yisak Park; Minung Kim; Seungyul Han

arXiv:2502.02834·cs.LG·May 21, 2026

Task-Aware Virtual Training: Enhancing Generalization in Meta-Reinforcement Learning for Out-of-Distribution Tasks

Jeongmo Kim, Yisak Park, Minung Kim, Seungyul Han

PDF

1 Repo 1 Video

TL;DR

This paper introduces Task-Aware Virtual Training (TAVT), a novel meta-reinforcement learning algorithm that improves out-of-distribution task generalization by capturing task characteristics and regularizing states.

Contribution

TAVT is a new algorithm that enhances OOD generalization in meta-RL through metric-based task representation and state regularization techniques.

Findings

01

TAVT significantly improves OOD task performance in MuJoCo and MetaWorld environments.

02

The method effectively preserves task characteristics in virtual training scenarios.

03

State regularization reduces overestimation errors in dynamic environments.

Abstract

Meta reinforcement learning aims to develop policies that generalize to unseen tasks sampled from a task distribution. While context-based meta-RL methods improve task representation using task latents, they often struggle with out-of-distribution (OOD) tasks. To address this, we propose Task-Aware Virtual Training (TAVT), a novel algorithm that accurately captures task characteristics for both training and OOD scenarios using metric-based representation learning. Our method successfully preserves task characteristics in virtual tasks and employs a state regularization technique to mitigate overestimation errors in state-varying environments. Numerical results demonstrate that TAVT significantly enhances generalization to OOD tasks across various MuJoCo and MetaWorld environments. Our code is available at https://github.com/JM-Kim-94/tavt.git.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

JM-Kim-94/tavt.git
github

Videos

Task-Aware Virtual Training: Enhancing Generalization in Meta-Reinforcement Learning for Out-of-Distribution Tasks· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Advanced Bandit Algorithms Research