TL;DR
This paper introduces ARS, an LLM-based automatic routing solver that generates constraint-aware heuristics, significantly improving the efficiency and effectiveness of solving complex real-world vehicle routing problems.
Contribution
The paper presents RoutBench, a comprehensive VRP benchmark, and ARS, an LLM-powered solver that automatically creates heuristics for complex VRPs, addressing a wide range of real-world constraints.
Findings
ARS solves 91.67% of VRPs in benchmarks
ARS outperforms existing LLM-based methods and traditional solvers
Achieves at least 30% improvement across benchmarks
Abstract
Real-world Vehicle Routing Problems (VRPs) are characterized by a variety of practical constraints, making manual solver design both knowledge-intensive and time-consuming. Although there is increasing interest in automating the design of routing algorithms, existing research has explored only a limited array of VRP variants and fails to adequately address the complex and prevalent constraints encountered in real-world situations. To fill this gap, this paper introduces RoutBench, a benchmark of 1,000 VRP variants derived from 24 attributes, for evaluating the effectiveness of automatic routing solvers in addressing complex constraints. Along with RoutBench, we present the Automatic Routing Solver (ARS), which employs Large Language Model (LLM) agents to enhance a backbone algorithm framework by automatically generating constraint-aware heuristic code, based on problem descriptions and…
Peer Reviews
Decision·Submitted to ICLR 2026
- The numerical experiments convincingly show that combining RAG-based constraint code generation with existing heuristics outperforms the baseline approach of prompting LLMs directly. - The proposed approach of translating natural-language constraints into executable programs has strong potential to simplify the process of developing constraint-specific heuristic algorithms.
- The ablation study shows that leveraging the constraint database significantly improves constraint satisfaction. However, in practical applications, problem constraints are not always well-studied or included in such databases. Thus, the generality of the proposed method may be limited when dealing with novel or previously unseen constraints. - Although the framework aims to extend existing local search heuristics to handle diverse constraints, the paper does not analyze the scalability of th
• The paper releases RoutBench: 1,000 VRP variants (each with NL description, data, and validation code), which represents a broad benchmark contribution. • The ablations show each component of the framework matters, indicating the effectiveness of the design.
• The evaluation is based on the correctness/coverage of the per-instance validation code. It is not reliable enough to ensure that the generated program works for a class of VRP. If a checker under-specifies edge cases, SR can be overstated. • Best-Known Solutions (BKS) for RoutBench are produced by ARS itself under strict stops. It seems that this method cannot ensure the actual (near)optimal solution, and thus leads to a benchmark circularity risk. • The superior performance partly reflect
1. The paper's primary strength is the innovative design of the ARS framework, which intelligently separates the general solver backbone from the LLM-generated, problem-specific heuristic components. This is a clever and effective way to combine the reasoning power of LLMs with the proven search capabilities of metaheuristics. 2. The introduction of RoutBench is a major contribution in its own right. It provides a large-scale, diverse, and well-structured testbed for evaluating the generalizati
1. While the paper proposes the ARS framework, its originality is limited. The framework's RAG component utilizes existing technology, the "checker" and "scorer" steps are based on established ideas from heuristic VRP solvers, and the subsequent heuristic algorithm is also a pre-existing method. Overall, the lack of substantial novel content is the paper's most significant weakness. 2. The paper relies on a single-point-based search framework. It is unclear how the LLM-generated Constraint-Awar
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
