Assessing the Business Process Modeling Competences of Large Language Models
Chantale Lauer, Peter Pfeiffer, Alexander Rombach, Nijat Mehdiyev

TL;DR
This paper introduces BEF4LLM, a comprehensive evaluation framework for assessing large language models' ability to generate business process models, revealing their strengths and limitations compared to human experts.
Contribution
It presents a novel evaluation framework for LLMs in BPMN modeling and benchmarks multiple models against human experts across key quality dimensions.
Findings
LLMs excel in syntactic and pragmatic quality
Humans outperform in semantic quality
Differences are modest, showing LLMs' competitive potential
Abstract
The creation of Business Process Model and Notation (BPMN) models is a complex and time-consuming task requiring both domain knowledge and proficiency in modeling conventions. Recent advances in large language models (LLMs) have significantly expanded the possibilities for generating BPMN models directly from natural language, building upon earlier text-to-process methods with enhanced capabilities in handling complex descriptions. However, there is a lack of systematic evaluations of LLM-generated process models. Current efforts either use LLM-as-a-judge approaches or do not consider established dimensions of model quality. To this end, we introduce BEF4LLM, a novel LLM evaluation framework comprising four perspectives: syntactic quality, pragmatic quality, semantic quality, and validity. Using BEF4LLM, we conduct a comprehensive analysis of open-source LLMs and benchmark their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBusiness Process Modeling and Analysis · Artificial Intelligence in Law · Model-Driven Software Engineering Techniques
