Fine-tuning Reinforcement Learning Models is Secretly a Forgetting   Mitigation Problem

Maciej Wo{\l}czyk; Bart{\l}omiej Cupia{\l}; Mateusz Ostaszewski,; Micha{\l} Bortkiewicz; Micha{\l} Zaj\k{a}c; Razvan Pascanu; {\L}ukasz; Kuci\'nski; Piotr Mi{\l}o\'s

arXiv:2402.02868·cs.LG·July 18, 2024·2 cites

Fine-tuning Reinforcement Learning Models is Secretly a Forgetting Mitigation Problem

Maciej Wo{\l}czyk, Bart{\l}omiej Cupia{\l}, Mateusz Ostaszewski,, Micha{\l} Bortkiewicz, Micha{\l} Zaj\k{a}c, Razvan Pascanu, {\L}ukasz, Kuci\'nski, Piotr Mi{\l}o\'s

PDF

Open Access 1 Repo

TL;DR

This paper reveals that fine-tuning reinforcement learning models often causes forgetting of pre-trained capabilities, which hampers transfer and can be mitigated with knowledge retention techniques, leading to improved performance.

Contribution

It conceptualizes forgetting as a key challenge in RL fine-tuning, identifies conditions causing it, and demonstrates mitigation strategies that enhance transfer and performance.

Findings

01

Knowledge retention techniques mitigate forgetting in RL fine-tuning.

02

Achieved a new state-of-the-art score in NetHack with over 10K points.

03

Forgetting is a common and often catastrophic problem in RL transfer learning.

Abstract

Fine-tuning is a widespread technique that allows practitioners to transfer pre-trained capabilities, as recently showcased by the successful applications of foundation models. However, fine-tuning reinforcement learning (RL) models remains a challenge. This work conceptualizes one specific cause of poor transfer, accentuated in the RL setting by the interplay between actions and observations: forgetting of pre-trained capabilities. Namely, a model deteriorates on the state subspace of the downstream task not visited in the initial phase of fine-tuning, on which the model behaved well due to pre-training. This way, we lose the anticipated transfer benefits. We identify conditions when this problem occurs, showing that it is common and, in many cases, catastrophic. Through a detailed empirical analysis of the challenging NetHack and Montezuma's Revenge environments, we show that standard…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bartekcupial/finetuning-rl-as-cl
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics