Training Aware Sigmoidal Optimizer

David Mac\^edo; Pedro Dreyer; Teresa Ludermir; Cleber Zanchettin

arXiv:2102.08716·cs.LG·February 18, 2021·1 cites

Training Aware Sigmoidal Optimizer

David Mac\^edo, Pedro Dreyer, Teresa Ludermir, Cleber Zanchettin

PDF

Open Access

TL;DR

The paper introduces TASO, a two-phase automated learning rate schedule for deep neural networks that outperforms common adaptive optimizers in various training scenarios.

Contribution

It proposes a novel two-phase learning rate schedule tailored to the landscape of neural network loss functions, improving training efficiency and performance.

Findings

01

TASO outperforms Adam, RMSProp, and Adagrad in experiments.

02

TASO is effective in both hyperparameter-tuned and default settings.

03

The two-phase approach accelerates training and improves convergence.

Abstract

Proper optimization of deep neural networks is an open research question since an optimal procedure to change the learning rate throughout training is still unknown. Manually defining a learning rate schedule involves troublesome time-consuming try and error procedures to determine hyperparameters such as learning rate decay epochs and learning rate decay rates. Although adaptive learning rate optimizers automatize this process, recent studies suggest they may produce overffiting and reduce performance when compared to fine-tuned learning rate schedules. Considering that deep neural networks loss functions present landscapes with much more saddle points than local minima, we proposed the Training Aware Sigmoidal Optimizer (TASO), which consists of a two-phases automated learning rate schedule. The first phase uses a high learning rate to fast traverse the numerous saddle point, while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Machine Learning and Data Classification

MethodsAttentive Walk-Aggregating Graph Neural Network · Adam · RMSProp