Learning a subspace of policies for online adaptation in Reinforcement Learning
Jean-Baptiste Gaya, Laure Soulier, Ludovic Denoyer

TL;DR
This paper introduces a method for online adaptation in reinforcement learning by learning a subspace of policies, enabling better generalization to unseen environment variations without extensive tuning or additional components.
Contribution
The authors propose a novel subspace policy learning approach that improves generalization and online adaptation in RL without relying on meta-RL or extra modules.
Findings
Outperforms baseline methods on various benchmarks.
Learns policies that achieve high rewards in unseen environments.
Simple to tune and does not require extra components.
Abstract
Deep Reinforcement Learning (RL) is mainly studied in a setting where the training and the testing environments are similar. But in many practical applications, these environments may differ. For instance, in control systems, the robot(s) on which a policy is learned might differ from the robot(s) on which a policy will run. It can be caused by different internal factors (e.g., calibration issues, system attrition, defective modules) or also by external changes (e.g., weather conditions). There is a need to develop RL methods that generalize well to variations of the training conditions. In this article, we consider the simplest yet hard to tackle generalization setting where the test environment is unknown at train time, forcing the agent to adapt to the system's new dynamics. This online adaptation process can be computationally expensive (e.g., fine-tuning) and cannot rely on meta-RL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Software Reliability and Analysis Research · Adversarial Robustness in Machine Learning
MethodsTest
