Do optimization methods in deep learning applications matter?
Buse Melis Ozyildirim (1), Mariam Kiran (2) ((1) Department of, Computer Engineering Cukurova University, (2) Energy Sciences Network, Lawrence Berkeley National Laboratory)

TL;DR
This paper compares various optimization methods in deep learning, highlighting the trade-offs between convergence speed and computational complexity across different algorithms and applications.
Contribution
It provides an experimental comparison of first and higher-order optimization functions, analyzing their performance and computational costs in deep learning tasks.
Findings
Levenberg-Marquardt outperforms in convergence speed
LM has significantly higher training time
CG and SGD are more practical for large-scale applications
Abstract
With advances in deep learning, exponential data growth and increasing model complexity, developing efficient optimization methods are attracting much research attention. Several implementations favor the use of Conjugate Gradient (CG) and Stochastic Gradient Descent (SGD) as being practical and elegant solutions to achieve quick convergence, however, these optimization processes also present many limitations in learning across deep learning applications. Recent research is exploring higher-order optimization functions as better approaches, but these present very complex computational challenges for practical use. Comparing first and higher-order optimization functions, in this paper, our experiments reveal that Levemberg-Marquardt (LM) significantly supersedes optimal convergence but suffers from very large processing time increasing the training complexity of both, classification and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning
MethodsStochastic Gradient Descent
