High Dimensional Optimization through the Lens of Machine Learning

Felix Benning

arXiv:2112.15392·math.OC·January 3, 2022

High Dimensional Optimization through the Lens of Machine Learning

Felix Benning

PDF

Open Access 1 Repo

TL;DR

This thesis explores high-dimensional optimization techniques in machine learning, providing theoretical insights into why common heuristics like stochastic gradient descent are effective, and analyzing the role of popular optimizers in practice.

Contribution

It offers a theoretical foundation for understanding high-dimensional optimization methods used in machine learning, especially explaining the success of popular heuristics and default optimizers.

Findings

01

Convergence proofs for stochastic gradient descent and momentum methods.

02

Intuition on quadratic models for non-convex optimization.

03

Explanation of why common heuristics are effective in practice.

Abstract

This thesis reviews numerical optimization methods with machine learning problems in mind. Since machine learning models are highly parametrized, we focus on methods suited for high dimensional optimization. We build intuition on quadratic models to figure out which methods are suited for non-convex optimization, and develop convergence proofs on convex functions for this selection of methods. With this theoretical foundation for stochastic gradient descent and momentum methods, we try to explain why the methods used commonly in the machine learning field are so successful. Besides explaining successful heuristics, the last chapter also provides a less extensive review of more theoretical methods, which are not quite as popular in practice. So in some sense this work attempts to answer the question: Why are the default Tensorflow optimizers included in the defaults?

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

felixbenning/masterthesis
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications