Muon-OGD: Muon-based Spectral Orthogonal Gradient Projection for LLM Continual Learning
Binghang Lu, Zheyuan Deng, Runyu Zhang, Bing Hu, Yunhan Zhao, Yuan Tian, Changhong Mou, Guang Lin, Xiaomin Li

TL;DR
Muon-OGD introduces a spectral-norm-aware continual learning framework for LLMs, improving stability and performance by integrating orthogonal projection constraints with spectral norm geometry.
Contribution
It proposes Muon-OGD, a novel spectral-norm-based optimization method that enhances continual learning in LLMs by combining Muon-style updates with orthogonal projections.
Findings
Muon-OGD outperforms fine-tuning and baseline methods on standard benchmarks.
The method remains computationally scalable for large models.
Spectral-norm-aware geometry improves stability-plasticity trade-off.
Abstract
A central challenge in continual learning for large language models (LLMs) is catastrophic forgetting, where adapting to new tasks can substantially degrade performance on previously learned ones. Existing projection-based methods mitigate such interference by restricting parameter updates to subspaces that are orthogonal to directions associated with past tasks. However, these methods are typically formulated under Euclidean parameter geometry, with update magnitudes and projections governed by the Frobenius norm. The recent empirical success of the Muon optimizer, which applies orthogonalized matrix updates and admits a spectral-norm interpretation, suggests that Frobenius geometry may not be the most effective choice for matrix-valued LLM parameters. Motivated by this observation, we propose Muon-OGD, a spectral-norm-aware continual learning framework that integrates Muon-style…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
