Automated BPMN Model Generation from Textual Process Descriptions: A Multi-Stage LLM-Driven Approach
Ion Matei, Maksym Zhenirovskyy, Praveen Kumar Menaka Sekar, Hon Yung Wong

TL;DR
This paper introduces a scalable, multi-stage LLM-driven pipeline for automatically generating and validating BPMN models from natural language descriptions, addressing heterogeneity and multilingual challenges.
Contribution
It presents a novel multi-stage process combining translation, validation, correction, and similarity assessment to reconstruct executable BPMN diagrams from text.
Findings
Generated 387 validated ground-truth models from 750 diagrams.
Achieved average reconstruction similarity above 0.75.
Approximately 50 near-perfect reconstructions with minor variations.
Abstract
Automatically reconstructing BPMN models from unstructured natural-language descriptions remains challenging due to heterogeneous modeling conventions, multilingual sources, and the lack of reliable ground truth. We present a scalable, multi-stage LLM-driven pipeline that automates both ground-truth construction and model reconstruction. Multilingual BPMN XML files are translated into English, validated using execution-oriented compliance checks in SpiffWorkflow, and iteratively repaired through targeted LLM-guided corrections to produce a consistent ground-truth corpus. From these validated models, process descriptions are generated and used to reconstruct executable BPMN~2.0 XML diagrams without manual curation. We introduce a multi-dimensional similarity framework combining structural metrics, type-distribution alignment, and embedding-based semantic measures. In an empirical study…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
