An Isometric Stochastic Optimizer

Jacob Jackson

arXiv:2307.12979·cs.LG·July 25, 2023·1 cites

An Isometric Stochastic Optimizer

Jacob Jackson

PDF

Open Access

TL;DR

This paper introduces Iso, a novel optimizer inspired by Adam's success, which maintains update norm invariance under linear transformations, leading to improved training speed for small Transformers.

Contribution

The paper proposes Iso, an isometric optimizer that ensures update norm invariance, and IsoAdam, a variant enabling hyperparameter transfer from Adam, demonstrating practical speed improvements.

Findings

01

IsoAdam outperforms Adam in small Transformer training.

02

Iso maintains update invariance under linear transformations.

03

Hyperparameters can be effectively transferred from Adam to IsoAdam.

Abstract

The Adam optimizer is the standard choice in deep learning applications. I propose a simple explanation of Adam's success: it makes each parameter's step size independent of the norms of the other parameters. Based on this principle I derive Iso, a new optimizer which makes the norm of a parameter's update invariant to the application of any linear transformation to its inputs and outputs. I develop a variant of Iso called IsoAdam that allows optimal hyperparameters to be transferred from Adam, and demonstrate that IsoAdam obtains a speedup over Adam when training a small Transformer.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Image and Signal Denoising Methods · Model Reduction and Neural Networks

MethodsMulti-Head Attention · Attention Is All You Need · Label Smoothing · Layer Normalization · Absolute Position Encodings · Linear Layer · Softmax · Dense Connections · Dropout · Residual Connection