AHD Agent: Agentic Reinforcement Learning for Automatic Heuristic Design

Haoze Lv; Ning Lu; Ziang Zhou; Shengcai Liu

arXiv:2605.08756·cs.AI·May 12, 2026

AHD Agent: Agentic Reinforcement Learning for Automatic Heuristic Design

Haoze Lv, Ning Lu, Ziang Zhou, Shengcai Liu

PDF

TL;DR

The paper introduces AHD Agent, a reinforcement learning framework enabling large language models to proactively generate heuristics or retrieve evidence, significantly improving autonomous heuristic design for complex optimization problems.

Contribution

It proposes a novel multi-turn, tool-integrated framework with an agentic RL system that enhances LLMs' ability to autonomously design heuristics across diverse domains.

Findings

01

AHD Agent matches or surpasses larger models' performance.

02

It requires fewer evaluations than existing methods.

03

Effective across eight diverse problem domains.

Abstract

Automatic heuristic design (AHD) has emerged as a promising paradigm for solving NP-hard combinatorial optimization problems (COPs). Recent works show that large language models (LLMs), when integrated into well-designed frameworks (i.e., LLM-AHD), can autonomously discover high-performing heuristics. However, existing LLM-AHD frameworks typically treat LLMs as passive generators within fixed workflows, where the model generates heuristics from manually designed, limited context. Such context may fail to capture state-dependent information (e.g., specific failure modes), leading to inefficient trial-and-error exploration. To overcome these limitations, we propose AHD Agent, a novel tool-integrated, multi-turn framework that empowers LLMs to proactively decide whether to generate heuristics or invoke tools to retrieve targeted evidence from the solving environment. To effectively train…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.