A second-order-like optimizer with adaptive gradient scaling for deep learning
J\'er\^ome Bolte (TSE-R), Ryan Boustany (TSE-R), Edouard Pauwels, (TSE-R, IRIT-ADRIA), Andrei Purica

TL;DR
This paper introduces INNAprop, a second-order-like optimizer with adaptive gradient scaling that improves training efficiency and accuracy in deep learning models, including CNNs, ViT, and GPT-2, with minimal hyperparameter tuning.
Contribution
It presents INNAprop, a novel optimizer combining second-order information with adaptive gradient scaling, maintaining low memory usage while enhancing training performance.
Findings
INNAprop outperforms AdamW in training speed and accuracy.
It works effectively across various architectures and datasets.
Minimal hyperparameter tuning is required for large-scale training.
Abstract
In this empirical article, we introduce INNAprop, an optimization algorithm that combines the INNA method with the RMSprop adaptive gradient scaling. It leverages second-order information and rescaling while keeping the memory requirements of standard DL methods as AdamW or SGD with momentum. After giving geometrical insights, we evaluate INNAprop on CIFAR-10, Food101, and ImageNet with ResNets, VGG, DenseNet, and ViT, and on GPT-2 (OpenWebText) train from scratch and with LoRA fine-tuning (E2E). INNAprop consistently matches or outperforms AdamW both in training speed and accuracy, with minimal hyperparameter tuning in large-scale settings. Our code is publicly available at \url{https://github.com/innaprop/innaprop}.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and ELM · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
MethodsAttention Is All You Need · Layer Normalization · Adam · Attention Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Cosine Annealing · Byte Pair Encoding · Residual Connection · Linear Layer · Linear Warmup With Cosine Annealing
