Do optimization methods in deep learning applications matter?

Buse Melis Ozyildirim (1); Mariam Kiran (2) ((1) Department of; Computer Engineering Cukurova University; (2) Energy Sciences Network; Lawrence Berkeley National Laboratory)

arXiv:2002.12642·cs.LG·March 2, 2020·5 cites

Do optimization methods in deep learning applications matter?

Buse Melis Ozyildirim (1), Mariam Kiran (2) ((1) Department of, Computer Engineering Cukurova University, (2) Energy Sciences Network, Lawrence Berkeley National Laboratory)

PDF

Open Access

TL;DR

This paper compares various optimization methods in deep learning, highlighting the trade-offs between convergence speed and computational complexity across different algorithms and applications.

Contribution

It provides an experimental comparison of first and higher-order optimization functions, analyzing their performance and computational costs in deep learning tasks.

Findings

01

Levenberg-Marquardt outperforms in convergence speed

02

LM has significantly higher training time

03

CG and SGD are more practical for large-scale applications

Abstract

With advances in deep learning, exponential data growth and increasing model complexity, developing efficient optimization methods are attracting much research attention. Several implementations favor the use of Conjugate Gradient (CG) and Stochastic Gradient Descent (SGD) as being practical and elegant solutions to achieve quick convergence, however, these optimization processes also present many limitations in learning across deep learning applications. Recent research is exploring higher-order optimization functions as better approaches, but these present very complex computational challenges for practical use. Comparing first and higher-order optimization functions, in this paper, our experiments reveal that Levemberg-Marquardt (LM) significantly supersedes optimal convergence but suffers from very large processing time increasing the training complexity of both, classification and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning

MethodsStochastic Gradient Descent