# Extreme Tensoring for Low-Memory Preconditioning

**Authors:** Xinyi Chen, Naman Agarwal, Elad Hazan, Cyril Zhang, Yi Zhang

arXiv: 1902.04620 · 2019-02-14

## TL;DR

This paper introduces extreme tensoring, a memory-efficient method for adaptive preconditioning in large-scale models, significantly reducing memory use without sacrificing performance.

## Contribution

The paper proposes extreme tensoring, a novel approach that enables low-memory adaptive preconditioning applicable to arbitrary models, with theoretical guarantees and practical effectiveness.

## Key findings

- Reduces optimizer memory overhead by 1000x on NLP models
- Maintains model performance despite significant memory reduction
- Provides regret and convergence guarantees for the proposed method

## Abstract

State-of-the-art models are now trained with billions of parameters, reaching hardware limits in terms of memory consumption. This has created a recent demand for memory-efficient optimizers. To this end, we investigate the limits and performance tradeoffs of memory-efficient adaptively preconditioned gradient methods. We propose extreme tensoring for high-dimensional stochastic optimization, showing that an optimizer needs very little memory to benefit from adaptive preconditioning. Our technique applies to arbitrary models (not necessarily with tensor-shaped parameters), and is accompanied by regret and convergence guarantees, which shed light on the tradeoffs between preconditioner quality and expressivity. On a large-scale NLP model, we reduce the optimizer memory overhead by three orders of magnitude, without degrading performance.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.04620/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1902.04620/full.md

## References

22 references — full list in the complete paper: https://tomesphere.com/paper/1902.04620/full.md

---
Source: https://tomesphere.com/paper/1902.04620