Behavior of Limited Memory BFGS when Applied to Nonsmooth Functions and their Nesterov Smoothings
Azam Asl, Michael L. Overton

TL;DR
This paper investigates the behavior of limited-memory BFGS on nonsmooth functions and their smoothings, revealing that applying L-BFGS to smooth approximations often yields better results than direct application to nonsmooth problems.
Contribution
The paper provides theoretical analysis and empirical evidence on L-BFGS performance on nonsmooth functions and advocates using smooth approximations for better optimization outcomes.
Findings
L-BFGS often breaks down on nonsmooth functions when applied directly.
Scaled L-BFGS is more efficient on smooth approximations than unscaled.
Applying L-BFGS to smooth Nesterov approximations generally yields better results.
Abstract
The motivation to study the behavior of limited-memory BFGS (L-BFGS) on nonsmooth optimization problems is based on two empirical observations: the widespread success of L-BFGS in solving large-scale smooth optimization problems, and the effectiveness of the full BFGS method in solving small to medium-sized nonsmooth optimization problems, based on using a gradient, not a subgradient, oracle paradigm. We first summarize our theoretical results on the behavior of the scaled L-BFGS method with one update applied to a simple convex nonsmooth function that is unbounded below, stating conditions under which the method converges to a non-optimal point regardless of the starting point. We then turn to empirically investigating whether the same phenomenon holds more generally,focusing on a difficult problem of Nesterov, as well as eigenvalue optimization problems arising in semidefinite…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Advanced Optimization Algorithms Research · Stochastic Gradient Optimization Techniques
