Why Zeroth-Order Adaptation May Forget Less: A Randomized Shaping Theory
Yao Shu, Jian Mu, Zhongxiang Dai

TL;DR
This paper introduces a randomized shaping theory for zeroth-order adaptation in continual learning, explaining why ZO methods may forget less than first-order methods by analyzing the curvature exposure and retention properties.
Contribution
It provides a local randomized gradient-shaping analysis that clarifies the retention benefits of ZO adaptation and proposes the RISE algorithm for improved stability-plasticity tradeoff.
Findings
ZO improves mean forgetting when the FO direction has above-average retention curvature.
The analysis separates mean-step damage from random exposure, highlighting the role of curvature and blockwise sampling.
RISE applies calibrated ZO shape to exact FO gradients, enhancing stability in continual learning.
Abstract
Continual learning requires new-task adaptation without damaging previously acquired capabilities. Recent forward-pass and zeroth-order (ZO) results show that low-query adaptation may retain better than first-order (FO) descent, but the usual view of ZO as noisy FO estimation does not explain why. We give a local randomized gradient-shaping analysis: finite differences expose a raw shape that is mean-aligned with FO, while the norm-matched comparator fixes the expected squared adaptation norm. Under this controlled comparison, forgetting depends on how the adaptation shape exposes retention curvature. For norm-matched ZO, the expected shaped retention curvature obeys an exact identity that preserves the isotropic retention floor while contracting only the anisotropic component. Projecting this identity onto the incoming gradient yields the observable FO--ZO quadratic forgetting gap: ZO…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
