Efficient Dictionary Learning with Gradient Descent
Dar Gilboa, Sam Buchanan, John Wright

TL;DR
This paper proves that gradient descent can efficiently find near-global solutions in structured nonconvex dictionary learning problems by exploiting the geometry of saddle points, despite the exponential number of saddle points.
Contribution
It provides convergence guarantees for gradient descent in orthogonal dictionary learning, highlighting the role of negative curvature in escaping saddle points.
Findings
Gradient descent converges to a neighborhood of the global optimum.
Convergence rates scale polynomially with dimension.
Negative curvature facilitates escape from saddle points.
Abstract
Randomly initialized first-order optimization algorithms are the method of choice for solving many high-dimensional nonconvex problems in machine learning, yet general theoretical guarantees cannot rule out convergence to critical points of poor objective value. For some highly structured nonconvex problems however, the success of gradient descent can be understood by studying the geometry of the objective. We study one such problem -- complete orthogonal dictionary learning, and provide converge guarantees for randomly initialized gradient descent to the neighborhood of a global optimum. The resulting rates scale as low order polynomials in the dimension even though the objective possesses an exponential number of saddle points. This efficient convergence can be viewed as a consequence of negative curvature normal to the stable manifolds associated with saddle points, and we provide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Tensor decomposition and applications
