Overcoming Catastrophic Forgetting in Visual Continual Learning with Reinforcement Fine-Tuning
Meng Lou, Hanzhong Guo, Linwei Chen, Yizhou Yu

TL;DR
This paper introduces RaPO, a novel reinforcement fine-tuning method that significantly reduces catastrophic forgetting in visual continual learning by trajectory-level reward shaping and stability mechanisms.
Contribution
It proposes RaPO, a new RFT approach with reward shaping and normalization, to effectively mitigate forgetting in challenging visual continual learning scenarios.
Findings
RaPO outperforms existing methods across five visual continual learning settings.
RaPO substantially reduces catastrophic forgetting while maintaining plasticity.
Empirical results demonstrate RaPO's leading performance in continual learning benchmarks.
Abstract
Recent studies suggest that Reinforcement Fine-Tuning (RFT) is inherently more resilient to catastrophic forgetting than Supervised Fine-Tuning (SFT). However, whether RFT (e.g., GRPO) can effectively overcome forgetting in challenging visual continual learning settings, such as class-incremental learning (CIL) and domain-incremental learning (DIL), remains an open problem. Through a pilot study, we confirm that while RFT consistently outperforms SFT, it still suffers from non-negligible forgetting. We empirically trace this bottleneck to Trajectory-level Drift Agnosticism: among candidate rollouts achieving identical task rewards, the KL divergence from the preceding-task policy varies substantially, which strongly correlates with catastrophic forgetting across sequential tasks. Motivated by this insight, we propose Retention-aware Policy Optimization (RaPO), a simple yet effective RFT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
