Performance-Weighed Policy Sampling for Meta-Reinforcement Learning
Ibrahim Ahmed, Marcos Quinones-Grueiro, Gautam Biswas

TL;DR
This paper introduces Performance-Weighed Policy Sampling, an enhancement to MAML that improves rapid adaptation of reinforcement learning policies to new faults by leveraging previous experiences, demonstrated on control systems.
Contribution
It proposes a novel sampling method for MAML that maximizes parameter space coverage using past experiences, improving fault adaptation in RL-based control.
Findings
E-MAML with PPO outperforms standard MAML in fault adaptation speed.
The method effectively adapts to faults in both cart pole and aircraft fuel transfer systems.
Performance-weighted sampling enhances policy convergence with fewer samples.
Abstract
This paper discusses an Enhanced Model-Agnostic Meta-Learning (E-MAML) algorithm that generates fast convergence of the policy function from a small number of training examples when applied to new learning tasks. Built on top of Model-Agnostic Meta-Learning (MAML), E-MAML maintains a set of policy parameters learned in the environment for previous tasks. We apply E-MAML to developing reinforcement learning (RL)-based online fault tolerant control schemes for dynamic systems. The enhancement is applied when a new fault occurs, to re-initialize the parameters of a new RL policy that achieves faster adaption with a small number of samples of system behavior with the new fault. This replaces the random task sampling step in MAML. Instead, it exploits the extant previously generated experiences of the controller. The enhancement is sampled to maximally span the parameter space to facilitate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Fuel Cells and Related Materials · Adversarial Robustness in Machine Learning
MethodsModel-Agnostic Meta-Learning
