Learning to Modulate pre-trained Models in RL
Thomas Schmied, Markus Hofmarcher, Fabian Paischer, Razvan Pascanu,, Sepp Hochreiter

TL;DR
This paper investigates catastrophic forgetting in RL when fine-tuning pre-trained models on new tasks, and proposes a novel modulation method that retains pre-trained skills while adapting to new tasks, achieving state-of-the-art results.
Contribution
The paper introduces Learning-to-Modulate (L2M), a new method that prevents forgetting in RL fine-tuning by modulating a frozen pre-trained model's information flow.
Findings
L2M outperforms existing fine-tuning methods on Continual-World.
Most fine-tuning approaches cause significant performance degradation on pre-training tasks.
L2M retains pre-trained skills while achieving high performance on new tasks.
Abstract
Reinforcement Learning (RL) has been successful in various domains like robotics, game playing, and simulation. While RL agents have shown impressive capabilities in their specific tasks, they insufficiently adapt to new tasks. In supervised learning, this adaptation problem is addressed by large-scale pre-training followed by fine-tuning to new down-stream tasks. Recently, pre-training on multiple tasks has been gaining traction in RL. However, fine-tuning a pre-trained model often suffers from catastrophic forgetting. That is, the performance on the pre-training tasks deteriorates when fine-tuning on new tasks. To investigate the catastrophic forgetting phenomenon, we first jointly pre-train a model on datasets from two benchmark suites, namely Meta-World and DMControl. Then, we evaluate and compare a variety of fine-tuning methods prevalent in natural language processing, both in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Machine Learning and Data Classification
