DP-Muon: Differentially Private Optimization via Matrix-Orthogonalized Momentum

Jihwan Kim; Chenglin Fan

arXiv:2605.12994·cs.LG·May 14, 2026

DP-Muon: Differentially Private Optimization via Matrix-Orthogonalized Momentum

Jihwan Kim, Chenglin Fan

PDF

TL;DR

This paper introduces DP-Muon, a differentially private matrix optimizer with orthogonalization, providing privacy guarantees and improved utility in private training tasks.

Contribution

The paper formulates DP-Muon, analyzes its privacy and optimization properties, and proposes a bias-corrected variant DP-MuonBC that enhances utility without extra privacy cost.

Findings

01

DP-Muon inherits privacy guarantees from Gaussian accountant without extra cost.

02

Theoretical bounds separate optimization error, noise, and approximation errors.

03

DP-MuonBC reduces bias and improves utility in private fine-tuning experiments.

Abstract

We study differentially private (DP) training with Muon, a matrix-valued optimizer that updates hidden-layer weights using momentum followed by Newton--Schulz orthogonalization. While DP-SGD is well understood, the interaction between per-example clipping, Gaussian noise, momentum, and nonlinear orthogonalization in Muon has not been systematically analyzed. We formulate DP-Muon, a private Muon procedure that clips per-example matrix gradients, adds Gaussian noise to the clipped lot average, and then applies momentum and Newton--Schulz orthogonalization as post-processing. We prove that DP-Muon inherits the privacy guarantee certified by the corresponding same-lot subsampled Gaussian accountant, with no additional privacy cost from Muon-specific post-processing. On the optimization side, we establish finite-horizon and vanishing stationarity guarantees under per-matrix clipping, with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.