SMMF: Square-Matricized Momentum Factorization for Memory-Efficient   Optimization

Kwangryeol Park; Seulki Lee

arXiv:2412.08894·cs.LG·May 1, 2025

SMMF: Square-Matricized Momentum Factorization for Memory-Efficient Optimization

Kwangryeol Park, Seulki Lee

PDF

Open Access 1 Repo

TL;DR

SMMF introduces a memory-efficient optimization method that significantly reduces memory usage of adaptive optimizers like Adam, while maintaining comparable performance across CNN and Transformer models.

Contribution

It presents a novel square-matricization technique for flexible, low-rank momentum tensor factorization applicable to various deep learning architectures.

Findings

01

Reduces memory usage by up to 96% compared to existing optimizers.

02

Achieves similar model performance on CNN and Transformer tasks.

03

Provides theoretical convergence guarantees similar to AdamNC.

Abstract

We propose SMMF (Square-Matricized Momentum Factorization), a memory-efficient optimizer that reduces the memory requirement of the widely used adaptive learning rate optimizers, such as Adam, by up to 96%. SMMF enables flexible and efficient factorization of an arbitrary rank (shape) of the first and second momentum tensors during optimization, based on the proposed square-matricization and one-time single matrix factorization. From this, it becomes effectively applicable to any rank (shape) of momentum tensors, i.e., bias, matrix, and any rank-d tensors, prevalent in various deep model architectures, such as CNNs (high rank) and Transformers (low rank), in contrast to existing memory-efficient optimizers that applies only to a particular (rank-2) momentum tensor, e.g., linear layers. We conduct a regret bound analysis of SMMF, which shows that it converges similarly to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

eai-lab/SMMF
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Advanced Data Compression Techniques

MethodsAttention Is All You Need · Dropout · Position-Wise Feed-Forward Layer · Softmax · Dense Connections · Byte Pair Encoding · Linear Layer · Multi-Head Attention · Label Smoothing · Layer Normalization