MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and   Provable Convergence

Ionut-Vlad Modoranu; Mher Safaryan; Grigory Malinovsky; Eldar Kurtic,; Thomas Robert; Peter Richtarik; Dan Alistarh

arXiv:2405.15593·cs.LG·November 6, 2024·1 cites

MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence

Ionut-Vlad Modoranu, Mher Safaryan, Grigory Malinovsky, Eldar Kurtic,, Thomas Robert, Peter Richtarik, Dan Alistarh

PDF

Open Access 1 Repo 1 Video

TL;DR

MicroAdam is a memory-efficient variant of Adam that compresses gradient information with error feedback, maintaining convergence guarantees and practical performance on large-scale models like BERT and LLaMA.

Contribution

We introduce MicroAdam, a novel optimizer that reduces memory overhead through gradient compression with error feedback, while preserving convergence guarantees.

Findings

01

MicroAdam achieves significant memory savings on large models.

02

It maintains convergence comparable to Adam and AMSGrad.

03

It runs efficiently on GPUs for billion-scale models.

Abstract

We propose a new variant of the Adam optimizer called MicroAdam that specifically minimizes memory overheads, while maintaining theoretical convergence guarantees. We achieve this by compressing the gradient information before it is fed into the optimizer state, thereby reducing its memory footprint significantly. We control the resulting compression error via a novel instance of the classical \emph{error feedback} mechanism from distributed optimization in which *the error correction information is itself compressed* to allow for practical memory gains. We prove that the resulting approach maintains theoretical convergence guarantees competitive to those of AMSGrad, while providing good practical performance. Specifically, we show that MicroAdam can be implemented efficiently on GPUs: on both million-scale (BERT) and billion-scale (LLaMA) models, MicroAdam provides practical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ist-daslab/microadam
pytorchOfficial

Videos

MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence· slideslive

Taxonomy

TopicsMetaheuristic Optimization Algorithms Research

MethodsAMSGrad · Adam