ZeroFlow: Overcoming Catastrophic Forgetting is Easier than You Think
Tao Feng, Wei Li, Didi Zhu, Hangjie Yuan, Wendi Zheng, Dan Zhang, Jie Tang

TL;DR
ZeroFlow demonstrates that forward pass-based, gradient-free optimization methods can effectively mitigate catastrophic forgetting in continual learning, offering a practical alternative when gradient information is inaccessible.
Contribution
This paper introduces ZeroFlow, the first benchmark for evaluating gradient-free methods in overcoming forgetting, and uncovers new principles and enhancements for forward pass-based continual learning.
Findings
Forward passes alone can mitigate forgetting effectively.
Gradient-free methods can manage task conflicts and reduce memory demands.
Proposed enhancements improve forgetting resistance using only forward passes.
Abstract
Backpropagation provides a generalized configuration for overcoming catastrophic forgetting. Optimizers such as SGD and Adam are commonly used for weight updates in continual learning and continual pre-training. However, access to gradient information is not always feasible in practice due to black-box APIs, hardware constraints, or non-differentiable systems, a challenge we refer to as the gradient bans. To bridge this gap, we introduce ZeroFlow, the first benchmark designed to evaluate gradient-free optimization algorithms for overcoming forgetting. ZeroFlow examines a suite of forward pass-based methods across various algorithms, forgetting scenarios, and datasets. Our results show that forward passes alone can be sufficient to mitigate forgetting. We uncover novel optimization principles that highlight the potential of forward pass-based methods in mitigating forgetting, managing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Image Retrieval and Classification Techniques · Knowledge Management and Technology
MethodsStochastic Gradient Descent · Adam
