Poor starting points in machine learning
Mark Tygert

TL;DR
This paper investigates how different initialization strategies, including Nesterov acceleration and minibatch training, impact the optimization process in machine learning, especially when starting points are poor or random.
Contribution
It analyzes the effects of various optimization methods on poor initial points, highlighting the benefits of Nesterov acceleration and minibatch training in such scenarios.
Findings
Nesterov acceleration can improve early training with poor starting points.
Minibatch training enhances the effectiveness of Nesterov acceleration.
Poor initial points can significantly affect convergence and optimization efficiency.
Abstract
Poor (even random) starting points for learning/training/optimization are common in machine learning. In many settings, the method of Robbins and Monro (online stochastic gradient descent) is known to be optimal for good starting points, but may not be optimal for poor starting points -- indeed, for poor starting points Nesterov acceleration can help during the initial iterations, even though Nesterov methods not designed for stochastic approximation could hurt during later iterations. The common practice of training with nontrivial minibatches enhances the advantage of Nesterov acceleration.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Machine Learning and Algorithms
