Hidden Failure Modes of Gradient Modification under Adam in Continual Learning, and Adaptive Decoupled Moment Routing as a Repair

Yuelin Hu; Zhenbo Yu; Zhengxue Cheng; Wei Liu; and Li Song

arXiv:2604.22407·cs.LG·April 27, 2026

Hidden Failure Modes of Gradient Modification under Adam in Continual Learning, and Adaptive Decoupled Moment Routing as a Repair

Yuelin Hu, Zhenbo Yu, Zhengxue Cheng, Wei Liu, and Li Song

PDF

TL;DR

This paper reveals a hidden failure mode in gradient modification with Adam in continual learning and proposes an adaptive decoupled routing method as a simple, effective repair to prevent collapse.

Contribution

It identifies a failure mode caused by gradient projection in Adam and introduces an adaptive decoupled routing technique that stabilizes continual learning across multiple methods.

Findings

01

Shared-routing projection baselines collapse to vanilla forgetting.

02

Adaptive decoupled routing remains stable and improves performance.

03

The failure is linked to Adam's second-moment pathway inflation.

Abstract

Many continual-learning methods modify gradients upstream (e.g., projection, penalty rescaling, replay mixing) while treating Adam as a neutral backend. We show this composition has a hidden failure mode. In a high-overlap, non-adaptive 8-domain continual LM, all shared-routing projection baselines collapse close to vanilla forgetting (12.5--12.8 vs. 13.2). A 0.5% replay buffer is the strongest shared alternative but still reaches 11.6, while fixed-strength decoupling falls below vanilla at 14.1. Only adaptive decoupled routing remains stable at 9.4, improving over vanilla by 3.8 units. On a 16-domain stream, its gain over the strongest shared-routing projection baseline grows to 4.5--4.8 units. The failure is largely invisible on clean benchmarks. We explain this effect through Adam's second-moment pathway: in the tested regime, projection induces a 1/(1-alpha) inflation of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.