A second-order-like optimizer with adaptive gradient scaling for deep   learning

J\'er\^ome Bolte (TSE-R); Ryan Boustany (TSE-R); Edouard Pauwels; (TSE-R; IRIT-ADRIA); Andrei Purica

arXiv:2410.05871·cs.LG·December 13, 2024

A second-order-like optimizer with adaptive gradient scaling for deep learning

J\'er\^ome Bolte (TSE-R), Ryan Boustany (TSE-R), Edouard Pauwels, (TSE-R, IRIT-ADRIA), Andrei Purica

PDF

Open Access 1 Repo

TL;DR

This paper introduces INNAprop, a second-order-like optimizer with adaptive gradient scaling that improves training efficiency and accuracy in deep learning models, including CNNs, ViT, and GPT-2, with minimal hyperparameter tuning.

Contribution

It presents INNAprop, a novel optimizer combining second-order information with adaptive gradient scaling, maintaining low memory usage while enhancing training performance.

Findings

01

INNAprop outperforms AdamW in training speed and accuracy.

02

It works effectively across various architectures and datasets.

03

Minimal hyperparameter tuning is required for large-scale training.

Abstract

In this empirical article, we introduce INNAprop, an optimization algorithm that combines the INNA method with the RMSprop adaptive gradient scaling. It leverages second-order information and rescaling while keeping the memory requirements of standard DL methods as AdamW or SGD with momentum. After giving geometrical insights, we evaluate INNAprop on CIFAR-10, Food101, and ImageNet with ResNets, VGG, DenseNet, and ViT, and on GPT-2 (OpenWebText) train from scratch and with LoRA fine-tuning (E2E). INNAprop consistently matches or outperforms AdamW both in training speed and accuracy, with minimal hyperparameter tuning in large-scale settings. Our code is publicly available at \url{https://github.com/innaprop/innaprop}.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

innaprop/innaprop
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and ELM · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications

MethodsAttention Is All You Need · Layer Normalization · Adam · Attention Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Cosine Annealing · Byte Pair Encoding · Residual Connection · Linear Layer · Linear Warmup With Cosine Annealing