Performance-Weighed Policy Sampling for Meta-Reinforcement Learning

Ibrahim Ahmed; Marcos Quinones-Grueiro; Gautam Biswas

arXiv:2012.06016·cs.LG·December 14, 2020

Performance-Weighed Policy Sampling for Meta-Reinforcement Learning

Ibrahim Ahmed, Marcos Quinones-Grueiro, Gautam Biswas

PDF

Open Access

TL;DR

This paper introduces Performance-Weighed Policy Sampling, an enhancement to MAML that improves rapid adaptation of reinforcement learning policies to new faults by leveraging previous experiences, demonstrated on control systems.

Contribution

It proposes a novel sampling method for MAML that maximizes parameter space coverage using past experiences, improving fault adaptation in RL-based control.

Findings

01

E-MAML with PPO outperforms standard MAML in fault adaptation speed.

02

The method effectively adapts to faults in both cart pole and aircraft fuel transfer systems.

03

Performance-weighted sampling enhances policy convergence with fewer samples.

Abstract

This paper discusses an Enhanced Model-Agnostic Meta-Learning (E-MAML) algorithm that generates fast convergence of the policy function from a small number of training examples when applied to new learning tasks. Built on top of Model-Agnostic Meta-Learning (MAML), E-MAML maintains a set of policy parameters learned in the environment for previous tasks. We apply E-MAML to developing reinforcement learning (RL)-based online fault tolerant control schemes for dynamic systems. The enhancement is applied when a new fault occurs, to re-initialize the parameters of a new RL policy that achieves faster adaption with a small number of samples of system behavior with the new fault. This replaces the random task sampling step in MAML. Instead, it exploits the extant previously generated experiences of the controller. The enhancement is sampled to maximally span the parameter space to facilitate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Fuel Cells and Related Materials · Adversarial Robustness in Machine Learning

MethodsModel-Agnostic Meta-Learning