Automated BPMN Model Generation from Textual Process Descriptions: A Multi-Stage LLM-Driven Approach

Ion Matei; Maksym Zhenirovskyy; Praveen Kumar Menaka Sekar; Hon Yung Wong

arXiv:2604.12105·cs.SE·April 15, 2026

Automated BPMN Model Generation from Textual Process Descriptions: A Multi-Stage LLM-Driven Approach

Ion Matei, Maksym Zhenirovskyy, Praveen Kumar Menaka Sekar, Hon Yung Wong

PDF

TL;DR

This paper introduces a scalable, multi-stage LLM-driven pipeline for automatically generating and validating BPMN models from natural language descriptions, addressing heterogeneity and multilingual challenges.

Contribution

It presents a novel multi-stage process combining translation, validation, correction, and similarity assessment to reconstruct executable BPMN diagrams from text.

Findings

01

Generated 387 validated ground-truth models from 750 diagrams.

02

Achieved average reconstruction similarity above 0.75.

03

Approximately 50 near-perfect reconstructions with minor variations.

Abstract

Automatically reconstructing BPMN models from unstructured natural-language descriptions remains challenging due to heterogeneous modeling conventions, multilingual sources, and the lack of reliable ground truth. We present a scalable, multi-stage LLM-driven pipeline that automates both ground-truth construction and model reconstruction. Multilingual BPMN XML files are translated into English, validated using execution-oriented compliance checks in SpiffWorkflow, and iteratively repaired through targeted LLM-guided corrections to produce a consistent ground-truth corpus. From these validated models, process descriptions are generated and used to reconstruct executable BPMN~2.0 XML diagrams without manual curation. We introduce a multi-dimensional similarity framework combining structural metrics, type-distribution alignment, and embedding-based semantic measures. In an empirical study…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.