Leveraging Generative AI for Extracting Process Models from Multimodal Documents
Marvin Voelter, Raheleh Hadian, Timotheus Kampik, Marius Breitmayer,, Manfred Reichert

TL;DR
This paper explores the use of Generative Pre-trained Transformers (GPTs) to automatically generate graphical process models from combined text and image inputs, providing a new dataset and evaluation framework.
Contribution
It introduces a novel multi-modal dataset, evaluation metrics, and open-source code for assessing GPTs in process model generation from multimodal data.
Findings
GPTs show potential for semi-automated process modeling
Evaluation metrics enable systematic assessment
Open-source tools facilitate future research
Abstract
This paper presents an investigation of the capabilities of Generative Pre-trained Transformers (GPTs) to auto-generate graphical process models from multi-modal (i.e., text- and image-based) inputs. More precisely, we first introduce a small dataset as well as a set of evaluation metrics that allow for a ground truth-based evaluation of multi-modal process model generation capabilities. We then conduct an initial evaluation of commercial GPT capabilities using zero-, one-, and few-shot prompting strategies. Our results indicate that GPTs can be useful tools for semi-automated process modeling based on multi-modal inputs. More importantly, the dataset and evaluation metrics as well as the open-source evaluation code provide a structured framework for continued systematic evaluations moving forward.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBusiness Process Modeling and Analysis · Semantic Web and Ontologies · Service-Oriented Architecture and Web Services
