ConstraintBench: Benchmarking LLM Constraint Reasoning on Direct Optimization
Joseph Tso, Preston Schmittou, Quan Huynh, Jibran Hutchins

TL;DR
This paper introduces ConstraintBench, a benchmark for evaluating large language models' ability to directly solve constrained optimization problems across multiple domains, revealing current models' limitations in feasibility and optimality.
Contribution
ConstraintBench provides a comprehensive evaluation framework for LLMs on direct constrained optimization, including ground-truth solutions verified by Gurobi, and highlights key challenges in feasibility and optimality.
Findings
Best model achieves 65.0% feasibility
Feasible solutions reach 89-96% of optimal objective
Models struggle with joint feasibility and optimality
Abstract
Large language models are increasingly applied to operational decision-making where the underlying structure is constrained optimization. Existing benchmarks evaluate whether LLMs can formulate optimization problems as solver code, but leave open a complementary question. Can LLMs directly produce correct solutions to fully specified constrained optimization problems without access to a solver? We introduce ConstraintBench, a benchmark for evaluating LLMs on direct constrained optimization across 10 operations research domains, with all ground-truth solutions verified by the Gurobi solver. Each task presents a natural-language scenario with entities, constraints, and an optimization objective; the model must return a structured solution that a deterministic verifier checks against every constraint and the solver-proven optimum. We evaluate six frontier models on 200 tasks and find that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
