DCP-Bench-Open: Evaluating LLMs for Constraint Modelling of Discrete Combinatorial Problems

Kostis Michailidis; Dimos Tsouros; Tias Guns

arXiv:2506.06052·cs.AI·January 29, 2026

DCP-Bench-Open: Evaluating LLMs for Constraint Modelling of Discrete Combinatorial Problems

Kostis Michailidis, Dimos Tsouros, Tias Guns

PDF

1 Datasets

TL;DR

This paper introduces DCP-Bench-Open, a diverse benchmark dataset for evaluating large language models' ability to perform constraint modelling on a wide range of discrete combinatorial problems, addressing a key bottleneck in the field.

Contribution

It provides a new, diverse benchmark dataset for evaluating LLMs in constraint modelling, including multiple problem types and modelling frameworks, and systematically assesses various LLM and prompting strategies.

Findings

01

Higher performance with high-level Python frameworks

02

Prompt-based and inference-time methods improve accuracy

03

Achieved up to 91% accuracy on the benchmark

Abstract

Discrete Combinatorial Problems (DCPs) are prevalent in industrial decision-making and optimisation. However, while constraint solving technologies for DCPs have advanced significantly, the core process of formalising them, namely constraint modelling, requires significant expertise and remains a bottleneck for wider adoption. Aiming to alleviate this bottleneck, recent studies have explored using Large Language Models (LLMs) to transform combinatorial problem descriptions into executable constraint models. However, the existing evaluation datasets for discrete constraint modelling are often limited to small, homogeneous, or domain-specific problems, which do not capture the diversity of real-world scenarios. This work addresses this gap by introducing DCP-Bench-Open, a novel benchmark that includes a diverse set of well-known discrete combinatorial problems sourced from the Constraint…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

kostis-init/CP-Bench
dataset· 11 dl
11 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training