Poor starting points in machine learning

Mark Tygert

arXiv:1602.02823·cs.LG·February 10, 2016·1 cites

Poor starting points in machine learning

Mark Tygert

PDF

Open Access

TL;DR

This paper investigates how different initialization strategies, including Nesterov acceleration and minibatch training, impact the optimization process in machine learning, especially when starting points are poor or random.

Contribution

It analyzes the effects of various optimization methods on poor initial points, highlighting the benefits of Nesterov acceleration and minibatch training in such scenarios.

Findings

01

Nesterov acceleration can improve early training with poor starting points.

02

Minibatch training enhances the effectiveness of Nesterov acceleration.

03

Poor initial points can significantly affect convergence and optimization efficiency.

Abstract

Poor (even random) starting points for learning/training/optimization are common in machine learning. In many settings, the method of Robbins and Monro (online stochastic gradient descent) is known to be optimal for good starting points, but may not be optimal for poor starting points -- indeed, for poor starting points Nesterov acceleration can help during the initial iterations, even though Nesterov methods not designed for stochastic approximation could hurt during later iterations. The common practice of training with nontrivial minibatches enhances the advantage of Nesterov acceleration.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Machine Learning and Algorithms