Leveraging Machine Learning and Enhanced Parallelism Detection for BPMN Model Generation from Text

Phuong Nam L\^e; Charlotte Schneider-Depr\'e; Alexandre Goossens; Alexander Stevens; Aur\'elie Leribaux; Johannes De Smedt

arXiv:2507.08362·cs.LG·July 14, 2025

Leveraging Machine Learning and Enhanced Parallelism Detection for BPMN Model Generation from Text

Phuong Nam L\^e, Charlotte Schneider-Depr\'e, Alexandre Goossens, Alexander Stevens, Aur\'elie Leribaux, Johannes De Smedt

PDF

TL;DR

This paper presents an automated method for converting textual process descriptions into BPMN models using machine learning, enhanced by a new annotated dataset that improves detection of parallel structures.

Contribution

It introduces a novel annotated dataset with parallel gateways and a pipeline leveraging machine learning and large language models for BPMN extraction from text.

Findings

01

Improved detection of parallel gateways in BPMN models.

02

Enhanced accuracy in process model reconstruction.

03

Effective use of a new annotated dataset for training.

Abstract

Efficient planning, resource management, and consistent operations often rely on converting textual process documents into formal Business Process Model and Notation (BPMN) models. However, this conversion process remains time-intensive and costly. Existing approaches, whether rule-based or machine-learning-based, still struggle with writing styles and often fail to identify parallel structures in process descriptions. This paper introduces an automated pipeline for extracting BPMN models from text, leveraging the use of machine learning and large language models. A key contribution of this work is the introduction of a newly annotated dataset, which significantly enhances the training process. Specifically, we augment the PET dataset with 15 newly annotated documents containing 32 parallel gateways for model training, a critical feature often overlooked in existing datasets. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.