Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language

Yi Zhong; Buqiang Xu; Yijun Wang; Zifei Shan; Shuofei Qiao; Guozhou Zheng; Ningyu Zhang

arXiv:2604.19667·cs.CL·April 22, 2026

Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language

Yi Zhong, Buqiang Xu, Yijun Wang, Zifei Shan, Shuofei Qiao, Guozhou Zheng, Ningyu Zhang

PDF

1 Repo 1 Datasets

TL;DR

Chat2Workflow introduces a benchmark and framework to automate the creation of executable visual workflows from natural language, addressing current manual, error-prone processes in industrial settings.

Contribution

It provides a large real-world workflow dataset, a robust agentic framework, and evaluates language models' ability to generate practical workflows, highlighting current limitations.

Findings

01

State-of-the-art models capture high-level intent but struggle with correctness and stability.

02

The agentic framework improves resolve rates by up to 5.34%.

03

Chat2Workflow serves as a foundation for industrial automation advancements.

Abstract

At present, executable visual workflows have emerged as a mainstream paradigm in real-world industrial deployments, offering strong reliability and controllability. However, in current practice, such workflows are almost entirely constructed through manual engineering: developers must carefully design workflows, write prompts for each step, and repeatedly revise the logic as requirements evolve-making development costly, time-consuming, and error-prone. To study whether large language models can automate this multi-round interaction process, we introduce Chat2Workflow, a benchmark for generating executable visual workflows directly from natural language, and propose a robust agentic framework to mitigate recurrent execution errors. Chat2Workflow is built from a large collection of real-world business workflows, with each instance designed so that the generated workflow can be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zjunlp/Chat2Workflow
github

Datasets

zjunlp/Chat2Workflow-Evaluation
dataset· 3.1k dl
3.1k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.