Auto-Formulating Dynamic Programming Problems with Large Language Models
Chenyu Zhou, Jingyuan Yang, Linwei Xin, Yitian Chen, Ziyan He, Dongdong Ge

TL;DR
This paper introduces DPLM, a specialized large language model for automating dynamic programming problem formulation, and presents DP-Bench, a benchmark for evaluating such models on textbook DP problems.
Contribution
The paper develops DPLM, a 7B-parameter LLM tailored for DP problems, and proposes DualReflect, a synthetic data generation pipeline that balances diversity and correctness.
Findings
DPLM achieves performance comparable to state-of-the-art LLMs on DP tasks.
DualReflect effectively scales training data using forward and backward generation methods.
Backward generation is more reliable in low-data regimes, while forward generation enhances diversity at scale.
Abstract
Dynamic programming (DP) is a fundamental method in operations research, but formulating DP models has traditionally required expert knowledge of both the problem context and DP techniques. Large Language Models (LLMs) offer the potential to automate this process. However, DP problems pose unique challenges due to their inherently stochastic transitions and the limited availability of training data. These factors make it difficult to directly apply existing LLM-based models or frameworks developed for other optimization problems, such as linear or integer programming. We introduce DP-Bench, the first benchmark covering a wide range of textbook-level DP problems to enable systematic evaluation. We present Dynamic Programming Language Model (DPLM), a 7B-parameter specialized model that achieves performance comparable to state-of-the-art LLMs like OpenAI's o1 and DeepSeek-R1, and surpasses…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
