Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language
Yi Zhong, Buqiang Xu, Yijun Wang, Zifei Shan, Shuofei Qiao, Guozhou Zheng, Ningyu Zhang

TL;DR
Chat2Workflow introduces a benchmark and framework to automate the creation of executable visual workflows from natural language, addressing current manual, error-prone processes in industrial settings.
Contribution
It provides a large real-world workflow dataset, a robust agentic framework, and evaluates language models' ability to generate practical workflows, highlighting current limitations.
Findings
State-of-the-art models capture high-level intent but struggle with correctness and stability.
The agentic framework improves resolve rates by up to 5.34%.
Chat2Workflow serves as a foundation for industrial automation advancements.
Abstract
At present, executable visual workflows have emerged as a mainstream paradigm in real-world industrial deployments, offering strong reliability and controllability. However, in current practice, such workflows are almost entirely constructed through manual engineering: developers must carefully design workflows, write prompts for each step, and repeatedly revise the logic as requirements evolve-making development costly, time-consuming, and error-prone. To study whether large language models can automate this multi-round interaction process, we introduce Chat2Workflow, a benchmark for generating executable visual workflows directly from natural language, and propose a robust agentic framework to mitigate recurrent execution errors. Chat2Workflow is built from a large collection of real-world business workflows, with each instance designed so that the generated workflow can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
