Memory Efficient Mixed-Precision Optimizers

Basile Lewandowski; Atli Kosson

arXiv:2309.12381·cs.LG·September 25, 2023·1 cites

Memory Efficient Mixed-Precision Optimizers

Basile Lewandowski, Atli Kosson

PDF

Open Access

TL;DR

This paper introduces memory-efficient mixed-precision optimizers that reduce memory usage and training time by eliminating floating point copies and integrating optimizer steps during back-propagation, without sacrificing accuracy.

Contribution

It presents a novel algorithm that minimizes memory during training by removing floating point copies and executing optimizer steps during back-propagation.

Findings

01

Up to 25% reduction in peak memory usage

02

15% faster training times

03

Maintains the same model accuracy

Abstract

Traditional optimization methods rely on the use of single-precision floating point arithmetic, which can be costly in terms of memory size and computing power. However, mixed precision optimization techniques leverage the use of both single and half-precision floating point arithmetic to reduce memory requirements while maintaining model accuracy. We provide here an algorithm to further reduce memory usage during the training of a model by getting rid of the floating point copy of the parameters, virtually keeping only half-precision numbers. We also explore the benefits of getting rid of the gradient's value by executing the optimizer step during the back-propagation. In practice, we achieve up to 25% lower peak memory use and 15% faster training while maintaining the same level of accuracy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNumerical Methods and Algorithms · Model Reduction and Neural Networks · Digital Filter Design and Implementation