OPTAMI: Global Superlinear Convergence of High-order Methods

Dmitry Kamzolov; Dmitry Pasechnyuk; Artem Agafonov; Alexander; Gasnikov; Martin Tak\'a\v{c}

arXiv:2410.04083·math.OC·October 15, 2024·ICLR

OPTAMI: Global Superlinear Convergence of High-order Methods

Dmitry Kamzolov, Dmitry Pasechnyuk, Artem Agafonov, Alexander, Gasnikov, Martin Tak\'a\v{c}

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper advances high-order convex optimization by establishing global superlinear convergence for certain methods, introducing a practical accelerated tensor method, and providing an open-source library to facilitate their application.

Contribution

It proves superlinear convergence of high-order methods for star-convex functions, introduces NATA, a practical accelerated tensor method, and releases OPTAMI, a comprehensive library for high-order optimization.

Findings

01

High-order methods exhibit global superlinear convergence for star-convex functions.

02

NATA significantly outperforms classical high-order acceleration methods.

03

The OPTAMI library enables practical application and research of high-order optimization methods.

Abstract

Second-order methods for convex optimization outperform first-order methods in terms of theoretical iteration convergence, achieving rates up to $O (k^{- 5})$ for highly-smooth functions. However, their practical performance and applications are limited due to their multi-level structure and implementation complexity. In this paper, we present new results on high-order optimization methods, supported by their practical performance. First, we show that the basic high-order methods, such as the Cubic Regularized Newton Method, exhibit global superlinear convergence for $μ$ -strongly star-convex functions, a class that includes $μ$ -strongly convex functions and some non-convex functions. Theoretical convergence results are both inspired and supported by the practical performance of these methods. Secondly, we propose a practical version of the Nesterov Accelerated Tensor method, called…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 4

Strengths

Overall, I think that the contributions of the paper are valuable and the technical content would be sufficient for publishing the paper in ICLR. Moreover, I did not find any major mistake in the proofs. The library seems to be qualitative and the numerical experiments are satisfactory to me.

Weaknesses

I am not convinced by the structure of the paper and the way it is written. It seems to me that the authors try to answer too many questions for a 10 pages paper. Also, I think that the theoretical claims are not discussed enough. Detailed comments can be found below. The main problem I have with the current version of the paper is that it lacks a clear unified story and it seems an aggregation of results. In my opinion, this could be improved by modifying the structure of the paper. The OPTAM

Reviewer 02Rating 8Confidence 4

Strengths

1. The proposed acceleration variant of tensor method achieves better empirical performance than the existing (near)-optimal accelerated second-order methods. 2. All methods are systematically implemented and released as a library.

Weaknesses

1. The global linear rate is established by relaxing the required accuracy to exceed the radius of the quadratic convergence region. 2. Some typos: - line 143 "$\epsilon \leq c_3 r$" --> $\epsilon > c_3 r$ - Eq (20) $t \rightarrow 0$ --> $t \rightarrow \infty$?

Reviewer 03Rating 6Confidence 3

Strengths

The paper is well-organized and easy to follow. The results are novel and should be interesting to the audience from machine learning and optimization fields. The problems studied in this work are important and applicable to certain practical problems where the computational time is not a critical constraint. The proof in the main manuscript should be correct, while I do not have time to check the proofs in the appendix due to the time limit.

Weaknesses

The main problem with the paper is that the first part (superlinear convergence rate of high-order methods) and the second part (NATA algorithm) seem to be independent and can be separated into two papers. I feel that these two parts considered two different topics. The first one is mostly theoretical and is about non-accelerated methods, while the second one is about accelerated methods and their empirical performance. I would suggest the authors split the paper into two and include more detail

Code & Models

Repositories

OPTAMI/OPTAMI
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Optimization Algorithms Research · Iterative Methods for Nonlinear Equations · Matrix Theory and Algorithms