Reasoning in a Combinatorial and Constrained World: Benchmarking LLMs on Natural-Language Combinatorial Optimization

Xia Jiang; Jing Chen; Cong Zhang; Jie Gao; Chengpeng Hu; Chenhao Zhang; Yaoxin Wu; Yingqian Zhang

arXiv:2602.02188·cs.AI·April 13, 2026

Reasoning in a Combinatorial and Constrained World: Benchmarking LLMs on Natural-Language Combinatorial Optimization

Xia Jiang, Jing Chen, Cong Zhang, Jie Gao, Chengpeng Hu, Chenhao Zhang, Yaoxin Wu, Yingqian Zhang

PDF

1 Datasets

TL;DR

This paper introduces NLCO, a comprehensive benchmark for evaluating large language models on natural language combinatorial optimization tasks, revealing their strengths and limitations across various problem types and sizes.

Contribution

The paper presents NLCO, a new benchmark with 43 problems and a detailed taxonomy, enabling fine-grained evaluation of LLMs on combinatorial optimization in natural language.

Findings

01

High-performing models solve small instances well but struggle as size increases.

02

Set-based tasks are easier for LLMs than graph-structured problems.

03

Model performance degrades with larger instances, even with more reasoning tokens.

Abstract

While large language models (LLMs) have shown strong performance in math and logic reasoning, their ability to handle combinatorial optimization (CO) -- searching high-dimensional solution spaces under hard constraints -- remains underexplored. To bridge the gap, we introduce NLCO, a \textbf{N}atural \textbf{L}anguage \textbf{C}ombinatorial \textbf{O}ptimization benchmark that evaluates LLMs on end-to-end CO reasoning: given a language-described decision-making scenario, the model must output a discrete solution without writing code or calling external solvers. NLCO covers 43 CO problems and is organized using a four-layer taxonomy of variable types, constraint families, global patterns, and objective classes, enabling fine-grained evaluation. We provide solver-annotated solutions and comprehensively evaluate LLMs by feasibility, solution optimality, and reasoning efficiency.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

summer142857jiang/NLCO
dataset· 83 dl
83 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.