4-bit Shampoo for Memory-Efficient Network Training

Sike Wang; Pan Zhou; Jia Li; Hua Huang

arXiv:2405.18144·cs.LG·January 13, 2025·1 cites

4-bit Shampoo for Memory-Efficient Network Training

Sike Wang, Pan Zhou, Jia Li, Hua Huang

PDF

Open Access 1 Repo

TL;DR

This paper introduces 4-bit Shampoo, a memory-efficient second-order optimizer that maintains performance comparable to 32-bit versions by quantizing the eigenvector matrix of the preconditioner, enabling large model training with reduced memory.

Contribution

First 4-bit second-order optimizer, demonstrating effective eigenvector matrix quantization for memory-efficient training without performance loss.

Findings

01

4-bit Shampoo matches 32-bit performance in image and language tasks.

02

Eigenvector matrix quantization outperforms direct preconditioner quantization.

03

Linear square quantization slightly better than dynamic tree quantization.

Abstract

Second-order optimizers, maintaining a matrix termed a preconditioner, are superior to first-order optimizers in both theory and practice. The states forming the preconditioner and its inverse root restrict the maximum size of models trained by second-order optimizers. To address this, compressing 32-bit optimizer states to lower bitwidths has shown promise in reducing memory usage. However, current approaches only pertain to first-order optimizers. In this paper, we propose the first 4-bit second-order optimizers, exemplified by 4-bit Shampoo, maintaining performance similar to that of 32-bit ones. We show that quantizing the eigenvector matrix of the preconditioner in 4-bit Shampoo is remarkably better than quantizing the preconditioner itself both theoretically and experimentally. By rectifying the orthogonality of the quantized eigenvector matrix, we enhance the approximation of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sike-wang/low-bit-shampoo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExperimental Learning in Engineering