Tom: Leveraging trend of the observed gradients for faster convergence
Anirudh Maiya, Inumella Sricharan, Anshuman Pandey, Srinivas K. S

TL;DR
The paper introduces Tom, a new optimizer that leverages observed gradient trends for improved convergence and accuracy in deep learning, outperforming existing adaptive optimizers on standard image classification datasets.
Contribution
Tom is a novel variant of Adam that incorporates gradient trend information with no tuning required, enhancing convergence speed and accuracy.
Findings
Tom outperforms Adagrad, Adadelta, RMSProp, and Adam in accuracy.
Tom achieves faster convergence on CIFAR-10, CIFAR-100, and CINIC-10.
The smoothing parameter in Tom requires no tuning.
Abstract
The success of deep learning can be attributed to various factors such as increase in computational power, large datasets, deep convolutional neural networks, optimizers etc. Particularly, the choice of optimizer affects the generalization, convergence rate, and training stability. Stochastic Gradient Descent (SGD) is a first order iterative optimizer that updates the gradient uniformly for all parameters. This uniform update may not be suitable across the entire training phase. A rudimentary solution for this is to employ a fine-tuned learning rate scheduler which decreases learning rate as a function of iteration. To eliminate the dependency of learning rate schedulers, adaptive gradient optimizers such as AdaGrad, AdaDelta, RMSProp, Adam employ a parameter-wise scaling term for learning rate which is a function of the gradient itself. We propose Tom (Trend over Momentum) optimizer,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Medical Image Segmentation Techniques
MethodsAdaGrad · Adam · AdaDelta · RMSProp
