CALM: Co-evolution of Algorithms and Language Model for Automatic Heuristic Design

Ziyao Huang; Weiwei Wu; Kui Wu; Jianping Wang; Wei-Bin Lee

arXiv:2505.12285·cs.NE·May 20, 2025

CALM: Co-evolution of Algorithms and Language Model for Automatic Heuristic Design

Ziyao Huang, Weiwei Wu, Kui Wu, Jianping Wang, Wei-Bin Lee

PDF

Open Access 3 Reviews

TL;DR

This paper introduces CALM, a hybrid framework that co-evolves language models and algorithms for automatic heuristic design, significantly improving optimization performance with minimal computational resources.

Contribution

It presents a novel hybrid approach combining verbal and numerical guidance through reinforcement learning to co-evolve LLMs and heuristics, outperforming existing methods.

Findings

01

Outperforms state-of-the-art baselines across various tasks.

02

Operates efficiently on a single 24GB GPU with a 7B model.

03

Surpasses methods relying solely on verbal guidance.

Abstract

Tackling complex optimization problems often relies on expert-designed heuristics, typically crafted through extensive trial and error. Recent advances demonstrate that large language models (LLMs), when integrated into well-designed evolutionary search frameworks, can autonomously discover high-performing heuristics at a fraction of the traditional cost. However, existing approaches predominantly rely on verbal guidance, i.e., manipulating the prompt generation process, to steer the evolution of heuristics, without adapting the underlying LLM. We propose a hybrid framework that combines verbal and numerical guidance, the latter achieved by fine-tuning the LLM via reinforcement learning based on the quality of generated heuristics. This joint optimization allows the LLM to co-evolve with the search process. Our method outperforms state-of-the-art (SOTA) baselines across various…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 4

Strengths

1. The related work section is clearly written and situates the method well within the context of prior research. 2. The paper provides a clear mathematical definition of the task and notations, which are often missing or ambiguous in previous work. 3. The interpretation of LLM-based evolutionary heuristic search as a general reinforcement learning process is insightful and conceptually interesting.

Weaknesses

1. The use of unique token overlap in the “idea” text as a diversity metric feels superficial. Simply counting different words doesn’t necessarily indicate whether two heuristics are algorithmically or behaviorally distinct, so the diversity might be overestimated. It’s efficient, but I think something like embedding-based similarity or even code-level structure comparison would better capture the real heuristic diversity. 2. My biggest concern is still about the RL-based LLM fine-tuning. RL is

Reviewer 02Rating 8Confidence 3

Strengths

1. The paper is easy to follow and well-written 2. The idea is good and has practical relevance 3. Emphasis on local models with low compute budget 4. Code is provided 5. Good results compared to some SOTA algorithms

Weaknesses

1. Authors do not report performance on public benchmarkse.g. TSPLib, which would make it easier to compare CALM against other methods 2. The performance gain in non-GRPO settings is not too clear. For example, it seems that MCTS-AHD outperforms CALM without GRPO, which makes me wonder about why not finetune MCTS-AHD ‘s LLM too, and if s,o what the performance would be 3. The novelty of finetuning LLMs for AHD is not too much due to the existence of concurrent works in finetuning, alth

Reviewer 03Rating 2Confidence 4

Strengths

1. CALM is presented as one of the first LLM-based AHD frameworks to incorporate GRPO 2. This work proposes new heuristic design operators such as injection, replacement and collapse, which could provide insights for future works. The proof and discussion about the introduction of collapse is provided. 3. Experimental results show that CALM outperforms LLM-based heuristic design baselines on multiple combinatorial optimization tasks.

Weaknesses

1. In line 206-207, the authors claim they use compact summaries instead of full code; however, the response examples in Appendix E show generated operators as exact code, like mutation operators in prior LLM-based AHD methods. Additionally, focusing on granular modifications might limit the search space and reduce code diversity. 2. The proposed method uses seed heuristics to generate initial heuristics; however, the authors do not specify the methodology for selecting these seed heuristics. T

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMetaheuristic Optimization Algorithms Research · Advanced Multi-Objective Optimization Algorithms · Constraint Satisfaction and Optimization