Beyond adaptive gradient: Fast-Controlled Minibatch Algorithm for   large-scale optimization

Corrado Coppola; Lorenzo Papa; Irene Amerini; Laura Palagi

arXiv:2411.15795·cs.LG·December 17, 2024

Beyond adaptive gradient: Fast-Controlled Minibatch Algorithm for large-scale optimization

Corrado Coppola, Lorenzo Papa, Irene Amerini, Laura Palagi

PDF

Open Access 1 Repo

TL;DR

This paper introduces F-CMA, a novel mini-batch optimization algorithm with line-search and convergence guarantees, significantly improving training speed and accuracy for deep learning models.

Contribution

F-CMA is a new optimization method that overcomes adaptive gradient limitations with a line-search approach and proven convergence, enhancing deep learning training efficiency.

Findings

01

Training time reduced by up to 68%

02

Per-epoch efficiency increased by up to 20%

03

Model accuracy improved by up to 5%

Abstract

Adaptive gradient methods have been increasingly adopted by deep learning community due to their fast convergence and reduced sensitivity to hyper-parameters. However, these methods come with limitations, such as increased memory requirements for elements like moving averages and a poorly understood convergence theory. To overcome these challenges, we introduce F-CMA, a Fast-Controlled Mini-batch Algorithm with a random reshuffling method featuring a sufficient decrease condition and a line-search procedure to ensure loss reduction per epoch, along with its deterministic proof of global convergence to a stationary point. To evaluate the F-CMA, we integrate it into conventional training protocols for classification tasks involving both convolutional neural networks and vision transformer models, allowing for a direct comparison with popular optimizers. Computational tests show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

corradocoppola97/Fast_CMA
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMetaheuristic Optimization Algorithms Research · Neural Networks and Applications

MethodsLinear Layer · Residual Connection · Softmax · Attention Is All You Need · Multi-Head Attention · Dense Connections · Layer Normalization · Vision Transformer