TL;DR
This paper introduces Pion, a spectral high-pass optimizer that improves upon Muon by addressing limitations in vision-language-action training and reinforcement learning with verifiable rewards, leading to better performance and stability.
Contribution
The paper proposes Pion, a novel spectral high-pass iteration replacing Muon’s uniform whitening, enhancing training stability and effectiveness in VLA and RLVR tasks.
Findings
Pion outperforms Muon and AdamW in VLA training success rates.
Pion achieves higher accuracy on grasp-and-place tasks with a real robot.
Pion maintains stability and outperforms in RLVR benchmarks.
Abstract
Muon is a matrix-aware optimizer that leverages Newton-Schulz (NS) iterations to enforce spectral gradient orthogonalization by driving all singular values of the momentum matrix toward 1. While this uniform spectral whitening enhances exploration and outperforms AdamW in LLM pretraining, we show it could lead to fundamental limitations beyond pretraining in two regimes: (i) cross-modality vision-language-action (VLA) training, where inherently low-rank action-module gradients cause amplification of noisy tail directions, and (ii) reinforcement learning with verifiable rewards (RLVR), where low-SNR gradients and the need to preserve per-head specialization from prior training make whitening unstable. To address these challenges, we propose Pion, a drop-in replacement for Muon that preserves its computational efficiency while replacing uniform spectral whitening with a two-stage…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
