Learning to Initialize Gradient Descent Using Gradient Descent

Kartik Ahuja; Amit Dhurandhar; Kush R. Varshney

arXiv:2012.12141·cs.LG·December 23, 2020

Learning to Initialize Gradient Descent Using Gradient Descent

Kartik Ahuja, Amit Dhurandhar, Kush R. Varshney

PDF

Open Access

TL;DR

This paper introduces a learning-based method for initialization in gradient descent algorithms, improving performance on non-convex problems by leveraging previous solutions instead of relying on random or handcrafted initializations.

Contribution

It proposes a novel approach to learn initialization rules from past solutions, with theoretical guarantees and demonstrated improvements across multiple non-convex tasks.

Findings

01

Consistent performance gains over traditional initialization methods

02

Theoretical guarantees for the proposed initialization approach

03

Effective application across diverse non-convex problems

Abstract

Non-convex optimization problems are challenging to solve; the success and computational expense of a gradient descent algorithm or variant depend heavily on the initialization strategy. Often, either random initialization is used or initialization rules are carefully designed by exploiting the nature of the problem class. As a simple alternative to hand-crafted initialization rules, we propose an approach for learning "good" initialization rules from previous solutions. We provide theoretical guarantees that establish conditions that are sufficient in all cases and also necessary in some under which our approach performs better than random initialization. We apply our methodology to various non-convex problems such as generating adversarial examples, generating post hoc explanations for black-box machine learning models, and allocating communication spectrum, and show consistent gains…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Machine Learning and Algorithms