Scalable Nested Optimization for Deep Learning

Jonathan Lorraine

arXiv:2407.01526·cs.LG·July 2, 2024

Scalable Nested Optimization for Deep Learning

Jonathan Lorraine

PDF

Open Access

TL;DR

This paper introduces scalable methods for nested optimization in deep learning, addressing challenges in hyperparameter tuning and GAN training by extending classical approaches to large-scale problems.

Contribution

It develops new tools for efficient nested optimization tailored for large-scale deep learning applications, overcoming limitations of traditional methods.

Findings

01

Scalable nested optimization methods outperform classical approaches.

02

Effective in hyperparameter optimization and GAN training.

03

Addresses large-scale deep learning challenges.

Abstract

Gradient-based optimization has been critical to the success of machine learning, updating a single set of parameters to minimize a single loss. A growing number of applications rely on a generalization of this, where we have a bilevel or nested optimization of which subsets of parameters update on different objectives nested inside each other. We focus on motivating examples of hyperparameter optimization and generative adversarial networks. However, naively applying classical methods often fails when we look at solving these nested problems on a large scale. In this thesis, we build tools for nested optimization that scale to deep learning setups.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmbedded Systems Design Techniques

MethodsSparse Evolutionary Training · Focus