Meta-Reinforcement Learning via Exploratory Task Clustering

Zhendong Chu; Hongning Wang

arXiv:2302.07958·cs.LG·February 17, 2023·1 cites

Meta-Reinforcement Learning via Exploratory Task Clustering

Zhendong Chu, Hongning Wang

PDF

Open Access

TL;DR

This paper introduces a meta-reinforcement learning method that uses task clustering to exploit structured heterogeneity among tasks, improving sample efficiency and adaptation speed.

Contribution

It proposes a novel exploratory policy for task clustering in meta-RL, enabling better knowledge sharing and more efficient policy adaptation.

Findings

01

Effectively uncovers task clusters in rewards and dynamics

02

Achieves superior sample efficiency over baselines

03

Demonstrates robustness across MuJoCo tasks

Abstract

Meta-reinforcement learning (meta-RL) aims to quickly solve new tasks by leveraging knowledge from prior tasks. However, previous studies often assume a single mode homogeneous task distribution, ignoring possible structured heterogeneity among tasks. Leveraging such structures can better facilitate knowledge sharing among related tasks and thus improve sample efficiency. In this paper, we explore the structured heterogeneity among tasks via clustering to improve meta-RL. We develop a dedicated exploratory policy to discover task structures via divide-and-conquer. The knowledge of the identified clusters helps to narrow the search space of task-specific information, leading to more sample efficient policy adaptation. Experiments on various MuJoCo tasks showed the proposed method can unravel cluster structures effectively in both rewards and state dynamics, proving strong advantages…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Advanced Multi-Objective Optimization Algorithms · Reinforcement Learning in Robotics