Notes on Worst-case Inefficiency of Gradient Descent Even in R^2
Shiliang Zuo

TL;DR
This paper demonstrates that gradient descent can take exponential time to escape saddle points even in simple two-dimensional non-convex functions, highlighting the importance of stochasticity for efficiency.
Contribution
It provides a theoretical negative result showing the potential inefficiency of gradient descent in escaping saddle points in 2D, emphasizing stochasticity's role.
Findings
Gradient descent may take exponential time to escape saddle points.
Experiments verify the theoretical exponential time result.
Stochasticity is crucial for efficient saddle point escape.
Abstract
Gradient descent is a popular algorithm in optimization, and its performance in convex settings is mostly well understood. In non-convex settings, it has been shown that gradient descent is able to escape saddle points asymptotically and converge to local minimizers [Lee et. al. 2016]. Recent studies also show a perturbed version of gradient descent is enough to escape saddle points efficiently [Jin et. al. 2015, Ge et. al. 2017]. In this paper we show a negative result: gradient descent may take exponential time to escape saddle points, with non-pathological two dimensional functions. While our focus is theoretical, we also conduct experiments verifying our theoretical result. Through our analysis we demonstrate that stochasticity is essential to escape saddle points efficiently.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Complexity and Algorithms in Graphs
