Learning complexity of gradient descent and conjugate gradient algorithms
Xianqi Jiao, Jia Liu, Zhiping Chen

TL;DR
This paper models the complexity of gradient descent and conjugate gradient algorithms as a statistical learning problem, providing bounds on their learnability and demonstrating the potential for algorithms to be learned from data.
Contribution
It introduces a new cost measure for optimization algorithms, derives bounds on the pseudo-dimension, and extends the analysis from GD to CG algorithms, enabling probabilistic identification of optimal algorithms.
Findings
Derived an upper bound for the pseudo-dimension of GD algorithms.
Extended the analysis to conjugate gradient algorithms for the first time.
Proved the existence of a learning algorithm to identify optimal algorithms with sufficient data.
Abstract
Gradient Descent (GD) and Conjugate Gradient (CG) methods are among the most effective iterative algorithms for solving unconstrained optimization problems, particularly in machine learning and statistical modeling, where they are employed to minimize cost functions. In these algorithms, tunable parameters, such as step sizes or conjugate parameters, play a crucial role in determining key performance metrics, like runtime and solution quality. In this work, we introduce a framework that models algorithm selection as a statistical learning problem, and thus learning complexity can be estimated by the pseudo-dimension of the algorithm group. We first propose a new cost measure for unconstrained optimization algorithms, inspired by the concept of primal-dual integral in mixed-integer linear programming. Based on the new cost measure, we derive an improved upper bound for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsFace and Expression Recognition · Neural Networks and Applications
