A Reality Check of Language Models as Formalizers on Constraint Satisfaction Problems

Rikhil Amonkar; Ceyhun Efe Kayan; Qimei Lai; Ronan Le Bras; Li Zhang

arXiv:2505.13252·cs.CL·April 1, 2026

A Reality Check of Language Models as Formalizers on Constraint Satisfaction Problems

Rikhil Amonkar, Ceyhun Efe Kayan, Qimei Lai, Ronan Le Bras, Li Zhang

PDF

TL;DR

This paper evaluates the effectiveness of large language models as formalizers for constraint satisfaction problems, revealing significant limitations in their scalability and reasoning capabilities.

Contribution

It provides a systematic analysis of LLMs as formalizers across multiple benchmarks and models, highlighting their underperformance and challenges compared to direct solving.

Findings

01

LLMs as formalizers underperform LLMs as solvers in most cases.

02

Formalization quality degrades with increasing problem complexity.

03

Excessive reasoning tokens can lead to hard-coded solutions.

Abstract

Recent work shows superior performance when using large language models (LLMs) as formalizers instead of as end-to-end solvers for symbolic reasoning problems. Given the problem description, the LLM generates a formal program that derives a solution via an external solver. We systematically investigate the formalization capability of LLMs on real-life constraint satisfaction problems on 4 benchmarks, 6 LLMs, and 2 types of formal languages. We show that LLM-as-formalizer by no means trivializes the problem but underperforms LLM-as-solver in 15 out of 24 model-dataset combinations, despite the former's verifiability and interpretability. Although the formalization space is magnitudes smaller than the search space, our scaling analysis shows that LLM-as-formalizer still drastically degrades as problem complexity increases similar to LLM-as-solver. To better understand this limitation, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.