Generalizable Heuristic Generation Through LLMs with Meta-Optimization
Yiding Shi, Jianan Zhou, Wen Song, Jieyi Bi, Yaoxin Wu, Zhiguang Cao, Jie Zhang

TL;DR
This paper introduces MoH, a meta-learning framework that uses LLMs to automatically generate diverse heuristic-optimizers, improving generalization and performance on combinatorial optimization problems without relying on predefined heuristics.
Contribution
MoH is a novel meta-optimization framework that autonomously constructs heuristic-optimizers with LLMs, enhancing heuristic diversity and generalization across multiple tasks.
Findings
Achieves state-of-the-art results on classic COPs.
Demonstrates strong cross-size generalization.
Constructs interpretable heuristic-optimizers.
Abstract
Heuristic design with large language models (LLMs) has emerged as a promising approach for tackling combinatorial optimization problems (COPs). However, existing approaches often rely on manually predefined evolutionary computation (EC) heuristic-optimizers and single-task training schemes, which may constrain the exploration of diverse heuristic algorithms and hinder the generalization of the resulting heuristics. To address these issues, we propose Meta-Optimization of Heuristics (MoH), a novel framework that operates at the optimizer level, discovering effective heuristic-optimizers through the principle of meta-learning. Specifically, MoH leverages LLMs to iteratively refine a meta-optimizer that autonomously constructs diverse heuristic-optimizers through (self-)invocation, thereby eliminating the reliance on a predefined EC heuristic-optimizer. These constructed…
Peer Reviews
Decision·ICLR 2026 Poster
Ablations show benefits from the proposed idea and examine different LLM backends and population sizes. The paper provides concrete examples/analysis indicating discovered strategies can resemble or hybridize classic metaheuristics. The paper shows strong empirical results and cross-size generalization on TSP and Online BPP. The proposed MoH often outperforms baselines. Multi-task training and controlled evaluation budgets are thoughtfully designed to encourage generalization.
Improvements over strong baselines can be modest in some settings. Further discussion should be included. According to experimental setups, main tables emphasize best-of-three runs, which can overstate gains versus mean/variance reporting. There might exist sensitivity to LLM choice and prompts. The robustness under model drift is uncertain.
- The core idea of optimizing the optimizer (via a meta-prompt) rather than just the heuristics themselves is a novel and interesting approach to leveraging LLMs in the optimization domain. - The paper correctly identifies generalizability as a key weakness in existing heuristic generation methods and explicitly designs its utility function to reward performance across different task distributions (i.e., problem sizes).
- While the method is described with complex terminology, its core mechanism appears to be a sophisticated form of meta-prompt optimization. The "meta-optimizer" is, in essence, a highly-tuned prompt that guides the LLM to sample effective heuristics. This idea, while implemented well, feels intuitive and perhaps more incremental than a fundamental breakthrough, which may limit the paper's conceptual contribution. - The paper's "generalizability" claim is weak and potentially misleading. Firstly
1. MoH introduces the idea of meta-optimization within LLM-based heuristics for combinatorial optimization, addressing key limitations of existing methods like the lack of diversity in heuristic exploration and challenges in generalization. 2. Extensive experiments demonstrate that MoH outperforms both traditional and LLM-based heuristic methods across various settings, showing its ability to tackle problems like TSP and BPP effectively.
1. While the authors claim that MoH does not incur significant computational overhead, the introduction of a meta-optimization layer adds complexity, which may increase the time and resources required, especially for large problems. 2. Though MoH performs well on classical COPs, its scalability to more complex or non-classical optimization problems (e.g., real-world applications) has not been thoroughly tested. 3. While multi-task learning is a strength, it could also lead to overfitting on the
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Model-Driven Software Engineering Techniques
