A Comparison of Optimization Algorithms for Deep Learning
Derya Soydaner

TL;DR
This paper compares various optimization algorithms, especially adaptive gradient methods, for deep learning across multiple datasets, highlighting their training behaviors and performance differences.
Contribution
It provides a detailed comparison of widely used adaptive optimization algorithms for deep learning on both supervised and unsupervised tasks.
Findings
Adaptive gradient methods show distinct training behaviors.
Performance varies across datasets and algorithms.
Some algorithms outperform basic optimizers on specific tasks.
Abstract
In recent years, we have witnessed the rise of deep learning. Deep neural networks have proved their success in many areas. However, the optimization of these networks has become more difficult as neural networks going deeper and datasets becoming bigger. Therefore, more advanced optimization algorithms have been proposed over the past years. In this study, widely used optimization algorithms for deep learning are examined in detail. To this end, these algorithms called adaptive gradient methods are implemented for both supervised and unsupervised tasks. The behaviour of the algorithms during training and results on four image datasets, namely, MNIST, CIFAR-10, Kaggle Flowers and Labeled Faces in the Wild are compared by pointing out their differences against basic optimization algorithms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
