OLLA: Optimizing the Lifetime and Location of Arrays to Reduce the   Memory Usage of Neural Networks

Benoit Steiner; Mostafa Elhoushi; Jacob Kahn; James Hegarty

arXiv:2210.12924·cs.LG·November 4, 2022·1 cites

OLLA: Optimizing the Lifetime and Location of Arrays to Reduce the Memory Usage of Neural Networks

Benoit Steiner, Mostafa Elhoushi, Jacob Kahn, James Hegarty

PDF

Open Access 1 Repo

TL;DR

OLLA is an algorithm that optimizes tensor lifetime and placement during neural network training, significantly reducing memory usage without altering models or training procedures, enabling faster and more memory-efficient training.

Contribution

OLLA introduces a novel ILP-based approach to optimize tensor memory management in neural networks without model modifications.

Findings

01

Reduces neural network memory usage by one-third on average.

02

Operates efficiently, taking minutes or seconds for large networks.

03

Does not require changes to existing training procedures.

Abstract

The size of deep neural networks has grown exponentially in recent years. Unfortunately, hardware devices have not kept pace with the rapidly increasing memory requirements. To cope with this, researchers have turned to techniques such as spilling and recomputation, which increase training time, or reduced precision and model pruning, which can affect model accuracy. We present OLLA, an algorithm that optimizes the lifetime and memory location of the tensors used to train neural networks. Our method reduces the memory usage of existing neural networks, without needing any modification to the models or their training procedures. We formulate the problem as a joint integer linear program (ILP). We present several techniques to simplify the encoding of the problem, and enable our approach to scale to the size of state-of-the-art neural networks using an off-the-shelf ILP solver. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/olla
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Machine Learning and Data Classification