Sharpness-Aware Black-Box Optimization
Feiyang Ye, Yueming Lyu, Xuehao Wang, Masashi Sugiyama, Yu Zhang, and, Ivor Tsang

TL;DR
This paper introduces SABO, a sharpness-aware black-box optimization algorithm that enhances model generalization by reparameterizing the objective and iteratively updating parameters within a neighborhood, supported by theoretical guarantees and empirical results.
Contribution
The paper proposes SABO, a novel sharpness-aware optimization method for black-box problems, with theoretical convergence and generalization guarantees, and demonstrates its effectiveness in prompt fine-tuning.
Findings
SABO improves model generalization in black-box optimization tasks.
Theoretical convergence rate and generalization bounds are established.
Empirical results show SABO outperforms existing methods in prompt fine-tuning.
Abstract
Black-box optimization algorithms have been widely used in various machine learning problems, including reinforcement learning and prompt fine-tuning. However, directly optimizing the training loss value, as commonly done in existing black-box optimization methods, could lead to suboptimal model quality and generalization performance. To address those problems in black-box optimization, we propose a novel Sharpness-Aware Black-box Optimization (SABO) algorithm, which applies a sharpness-aware minimization strategy to improve the model generalization. Specifically, the proposed SABO method first reparameterizes the objective function by its expectation over a Gaussian distribution. Then it iteratively updates the parameterized distribution by approximated stochastic gradients of the maximum objective value within a small neighborhood around the current solution in the Gaussian…
Peer Reviews
Decision·ICLR 2025 Poster
1. The paper is clearly written, well organized and easy to follow. 2. I think the whole topic is intriguing and worthy to probe, and the authors give some interesting insights. 3. It is valuable to find that the proposed method could be effective in prompt tuning.
1. My primary concern is the motivation regarding the introduction of SAM in black-box optimization. In other words, can we safely use the proposed method in black-box optimization? The implicit premise of SAM relies on the smoothness of the mapping function. However, black-box optimization may deal with highly complex, non-smooth, or potentially noisy objective landscapes, scenarios much more complicated than optimizing neural networks. In my opinion, it is crucial for the authors to clarify th
This is an interesting paper that combines ideas from sharpness aware minimization (SAM) with black box optimization. SAM has already proven to be a very useful technique to improve generalization in training ML models and the authors present an intuitive motivation for applying these ideas to BBO. SABO is a creative combination of SAM techniques and black box optimization methods. This paper has a strong theoretical analysis of SABO, including convergence rate and generalization error bounds.
Overall, I see no major weaknesses for this paper and some of these points raised here are covered in the questions below which might clear up these apparent weaknesses. For example, in eqn (8) there is a dependence on the parameter $\rho$. The performance of SABO would be sensitive to the neighborhood size $\rho$ in (8) and tuning this parameter would be critical for successful application of this technique. It would be nice to see methods which can automatically adapt this parameter in a usefu
- This paper is novel in that it extends sharpness-aware minimization (SAM) principles, typically used in gradient-based settings, to black-box optimization where gradients are unavailable. - This paper is supported by rigorous theoretical analysis, including convergence rates and generalization bounds in both full-batch and mini-batch settings. - The empirical results highlight SABO’s practical value and potential to improve generalization in real-world applications where direct gradient acces
- Due to its reliance on iterative updates to a Gaussian distribution, along with stochastic gradient approximations and KL divergence constraints, this method may incur higher computational costs per iteration compared to some existing black-box optimization methods, especially in high-dimensional settings. A discussion of the time complexity for various black-box optimization algorithms would strengthen the paper. - This paper introduces additional hyperparameters (e.g., neighborhood size for
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management
MethodsSharpness-Aware Minimization
