MuonAll: Muon Variant for Efficient Finetuning of Large Language Models

Saurabh Page; Advait Joshi; S. S. Sonawane

arXiv:2511.06086·cs.CL·November 11, 2025

MuonAll: Muon Variant for Efficient Finetuning of Large Language Models

Saurabh Page, Advait Joshi, S. S. Sonawane

PDF

Open Access

TL;DR

This paper introduces MuonAll, a new variant of the Muon optimizer that incorporates all model parameters for efficient finetuning of large language models, demonstrating competitive performance with AdamW.

Contribution

MuonAll transforms Muon into a 2D matrix form to include all parameters, enabling effective finetuning of language models up to 500 million parameters.

Findings

01

MuonAll performs on par with AdamW across benchmarks.

02

Extensive experiments validate MuonAll's effectiveness.

03

Open-source implementation available for community use.

Abstract

Muon optimizer has demonstrated robust results in pretraining of language models but its performance in finetuning of existing public pretrained models is not yet explored. Currently, Muon is used along with AdamW introducing a scope of improvement for adopting all parameters inside Muon. We introduce MuonAll, which incorporates all the parameters inside Muon by transforming into 2D matrices. We conduct extensive finetuning experiments across publicly available language models with model sizes upto half billion parameters. Muon and MuonAll perform at par with AdamW across major benchmarks, highlighting their effectiveness as alternative optimizers. We open-source the distributed implementations of Muon and MuonAll, available at https://github.com/Saurabh750/optimizer

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Computational Physics and Python Applications · Machine Learning in Materials Science