SOAP: Improving and Stabilizing Shampoo using Adam
Nikhil Vyas, Depen Morwani, Rosie Zhao, Mujin Kwun, Itai Shapira,, David Brandfonbrener, Lucas Janson, Sham Kakade

TL;DR
This paper introduces SOAP, a new optimization algorithm that combines Shampoo's preconditioning with Adam's efficiency, leading to faster training of large language models with fewer iterations and reduced computation time.
Contribution
The paper establishes a formal connection between Shampoo and Adafactor, and designs SOAP, a simpler, more efficient optimizer that improves large-scale language model training.
Findings
SOAP reduces training iterations by over 40% compared to AdamW.
SOAP decreases wall clock time by over 35% relative to AdamW.
SOAP achieves approximately 20% improvements over Shampoo in large language model training.
Abstract
There is growing evidence of the effectiveness of Shampoo, a higher-order preconditioning method, over Adam in deep learning optimization tasks. However, Shampoo's drawbacks include additional hyperparameters and computational overhead when compared to Adam, which only updates running averages of first- and second-moment quantities. This work establishes a formal connection between Shampoo (implemented with the 1/2 power) and Adafactor -- a memory-efficient approximation of Adam -- showing that Shampoo is equivalent to running Adafactor in the eigenbasis of Shampoo's preconditioner. This insight leads to the design of a simpler and computationally efficient algorithm: hampo with dam in the reconditioner's eigenbasis (SOAP). With regards to improving Shampoo's computational efficiency, the most straightforward approach would be to simply…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTree Root and Stability Studies · Music Technology and Sound Studies · Animal Vocal Communication and Behavior
MethodsAdamW · Adafactor · Adam
