More Than Memory Savings: Zeroth-Order Optimization Mitigates Forgetting in Continual Learning

Wanhao Yu; Zheng Wang; Shuteng Niu; Sen Lin; Li Yang

arXiv:2510.21019·cs.LG·March 13, 2026

More Than Memory Savings: Zeroth-Order Optimization Mitigates Forgetting in Continual Learning

Wanhao Yu, Zheng Wang, Shuteng Niu, Sen Lin, Li Yang

PDF

TL;DR

This paper explores how zeroth-order optimization can reduce forgetting in continual learning by promoting flatter loss landscapes, but it also introduces a new hybrid method, ZO-FC, that balances stability and plasticity effectively.

Contribution

The paper introduces ZO-FC, a hybrid approach applying ZO to a PEFT module with FO classifier, balancing stability and plasticity in continual learning.

Findings

01

ZO optimization leads to flatter loss landscapes and reduces forgetting.

02

ZO-FC achieves a better stability-plasticity trade-off in continual learning.

03

ZOFc maintains memory efficiency while improving adaptability.

Abstract

Zeroth-order (ZO) optimization has gained attention as a memory-efficient alternative to first-order (FO) methods, particularly in settings where gradient computation is expensive or even impractical. Beyond its memory efficiency, in this work, we investigate ZO optimization for continual learning (CL) as a novel approach to address the plasticity-stability-efficiency trilemma. Through theoretical analysis and empirical evidence, we show that ZO optimization naturally leads to flatter loss landscapes, which in turn reduce forgetting in CL. However, this stability comes at a cost of plasticity: due to its imprecise gradient estimates and slower convergence, ZO optimization tends to be less effective than FO in acquiring new task-specific knowledge, particularly under constrained training budgets. To better understand this trade-off, we conduct a holistic evaluation of ZO optimization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.