Scaling LLM Planning: NL2FLOW for Parametric Problem Generation and Rigorous Evaluation
Jungkoo Kang

TL;DR
This paper presents NL2Flow, an automated pipeline for generating and evaluating workflow planning problems to improve LLM reasoning, demonstrating that structured problem translation enhances success rates and providing insights into error sources.
Contribution
Introduction of NL2Flow, a scalable, automated method for generating and evaluating workflow planning problems using structured representations and natural language translation.
Findings
Best model achieved 86% success in valid plan generation.
Translation to structured JSON improved success rates.
Analysis revealed problem characteristics influence model performance.
Abstract
Robust workflow composition is critical for effective agent performance, yet progress in Large Language Model (LLM) planning and reasoning is hindered by a scarcity of scalable evaluation data. This work introduces NL2Flow, a fully automated pipeline for generating and evaluating workflow planning problems. NL2Flow generates problems parametrically in a structured intermediate representation, translating them into both natural language and formal PDDL. I evaluate several open-source, instruct-tuned LLMs on a dataset of 2296 low-difficulty problems generated by NL2Flow. Results demonstrate that the best-performing model achieved 86% success in generating valid plans and 69% in generating optimal plans (for solvable problems). Regression analysis shows that the influence of problem characteristics on plan generation is contingent on both model and prompt design. Importantly, translating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Artificial Intelligence in Healthcare and Education
