ProGO: Probabilistic Global Optimizer
Xinyu Zhang, Sujit Ghosh

TL;DR
ProGO is a probabilistic, gradient-free global optimization method that converges under mild conditions, efficiently approximating global optima for high-dimensional, non-convex functions, outperforming many existing algorithms.
Contribution
ProGO introduces a novel multidimensional integration-based framework with a latent slice sampler for scalable, rigorous global optimization without gradients.
Findings
Outperforms state-of-the-art methods in speed and regret.
Effective on various non-convex test functions.
Scalable to high-dimensional problems.
Abstract
In the field of global optimization, many existing algorithms face challenges posed by non-convex target functions and high computational complexity or unavailability of gradient information. These limitations, exacerbated by sensitivity to initial conditions, often lead to suboptimal solutions or failed convergence. This is true even for Metaheuristic algorithms designed to amalgamate different optimization techniques to improve their efficiency and robustness. To address these challenges, we develop a sequence of multidimensional integration-based methods that we show to converge to the global optima under some mild regularity conditions. Our probabilistic approach does not require the use of gradients and is underpinned by a mathematically rigorous convergence framework anchored in the nuanced properties of nascent optima distribution. In order to alleviate the problem of…
Peer Reviews
Decision·Submitted to ICLR 2024
- The paper is nicely written with clear presentation and explanations of the contributions. - The proposed algorithm is supported by theoretical asymptotic convergence results. - The numerical experiments report strong support for the efficiency and accuracy of the algorithm in comparison with several approaches.
- Although the algorithm is based on the convergence of m_k, the rationale behind the proposed sampling procedure for m_k is not clear. -There is no non-asymptotic analysis that relates the number of function evaluations with the optimality gap. - Although theoretical result is nice, it does not explain why this algorithm performs better than alternative approaches. As a result, two numerical experiments might not be sufficient to support the paper's claim. - The paper misses discussing mode
1) Propose a novel derivative-free optimization algorithm for global optimization with convergence guarantee. 2) Extend the framework proposed in Luo (2018) to non-compact constraint sets and analyze its global convergence. 3) Provide some promising experimental results.
1) In order to apply the proposed algorithm, the optimal function value $f^*$ is supposed to be known. The assumption is not true for many practical applications. 2) The algorithm is only tested on two test instances, which can be efficiently solved by a number of existing derivative-free optimization algorithms. More experiments are expected to convince the performance benefit of the algorithm, for example, see “N. Hansen, A. Auger, R. Ros, O. Mersmann, T. Tušar, D. Brockhoff. COCO: A Platform
1. This paper is thoeretically interesting, in that it proposed a new type of optimization algorithms and constructs its (asymptotic) convergence theory. This algorithm provably converges to the global minimum of the objective. Interestingly, they showed that $\int f(x)m_k(x) dx \downarrow f^*$, where $m_k(x)$ is a probability measure constructed using $f$. Further, when $x_k^* = \mathrm{argmax} (m_k(x))$, it holds that $x^*_k \to x^*$ in $\ell_2$ distance. 2. This paper is written clearly and
1. The authors only provides convergence property when the iteration $k$ goes to infinity. It seems promising that the new method out-performs classic algorithms on some functions, but what about the worst-case upper bound? It would be more interesting (and practical) if we have some non-asymptotic results like how many iterations we need to approximate an global minimizer within error $\epsilon$. If, in the worst case, the algorithm needs $(1/\epsilon)^d$ calls of function value oracle to find
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Advanced Optimization Algorithms Research · Stochastic Gradient Optimization Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
