Optimistic Gradient Learning with Hessian Corrections for High-Dimensional Black-Box Optimization
Yedidya Kfir, Elad Sarafian, Sarit Kraus, Yoram Louzoun

TL;DR
This paper introduces the OHGL algorithm, combining optimistic and higher-order gradient learning techniques to improve black-box optimization in high-dimensional, complex problems, achieving state-of-the-art results.
Contribution
It proposes two novel gradient learning variants, OGL and HGL, integrated into OHGL, enhancing robustness and accuracy in high-dimensional black-box optimization tasks.
Findings
OHGL achieves state-of-the-art performance on synthetic benchmarks.
OHGL effectively applies to high-dimensional ML tasks like adversarial training.
The methods outperform existing black-box optimization approaches.
Abstract
Black-box algorithms are designed to optimize functions without relying on their underlying analytical structure or gradient information, making them essential when gradients are inaccessible or difficult to compute. Traditional methods for solving black-box optimization (BBO) problems predominantly rely on non-parametric models and struggle to scale to large input spaces. Conversely, parametric methods that model the function with neural estimators and obtain gradient signals via backpropagation may suffer from significant gradient errors. A recent alternative, Explicit Gradient Learning (EGL), which directly learns the gradient using a first-order Taylor approximation, has demonstrated superior performance over both parametric and non-parametric methods. In this work, we propose two novel gradient learning variants to address the robustness challenges posed by high-dimensional,…
Peer Reviews
Decision·Submitted to ICLR 2025
The paper appears to be well-grounded in both theoretical and experimental methodologies. The use of both first- and second-order gradient information for better accuracy in high-dimensional spaces is a technically sound choice, demonstrating an understanding of gradient approximation complexities in BBO. The breakdown of components like Optimistic Gradient Learning and Higher-Order Gradient Learning helps clarify how each extension builds on EGL, and the inclusion of figures and empirical resu
1. Computational Complexity and Efficiency: The inclusion of Hessian corrections, while beneficial for accuracy, introduces substantial computational cost, particularly in high-dimensional spaces. 2. Generalization of Results: The results focus heavily on the COCO test suite and two specific applications. However, broader generalizability to other complex, black-box settings, such as reinforcement learning or sequential decision-making tasks, remains unaddressed. 3. Trust Region's Limitation
1. The concept of enhancing Explicit Gradient Learning (EGL) with second-order corrections is both interesting and novel, to the best of my knowledge. 2. The paper is generally well-written and presents its ideas in a clear and accessible manner.
1. My primary concern relates to the computational cost associated with Hessian computation or approximation, which may hinder the scalability of the proposed method. Given that memory- and computation-efficient Hessian approximation techniques are well-established in the literature, I strongly recommend that the authors evaluate their proposed method using more efficient approximations at larger scales. 2. The enhancements over EGL are attained through the integration of several design element
1. The proposed Hessian corrections and weighed gradient are well-motivated, effectively addressing the limitations of the previous EGL framework. 2. Real-world high-dimension examples, including adversarial attacks and code generation, demonstrate the scalability and efficiency of this framework.
1. Compared to previous work, such as EGL or model-based methods [1], the theoretical understanding of the proposed Hessian corrections and weighted gradients is lacking. A more rigorous theoretical analysis would help justify the design choices 2. The work introduces several algorithm-level designs in Sec. 5, such as adaptive sampling size and trust-region management. The tolerance analysis in Section 6.2 suggests that the influence of these designs can accumulate over time, potentially leadin
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Face and Expression Recognition · Neural Networks and Applications
