Structured Extraction from Business Process Diagrams Using Vision-Language Models
Pritam Deka, Barry Devereux

TL;DR
This paper introduces a novel pipeline that uses vision-language models combined with OCR to extract structured representations of BPMN diagrams directly from images, enabling analysis without source files.
Contribution
It presents a new method leveraging VLMs and OCR for extracting structured BPMN data from images, bypassing the need for source XML files and enhancing robustness.
Findings
OCR improves VLM performance in component extraction
Benchmarking shows varying model effectiveness
Statistical analysis clarifies OCR impact
Abstract
Business Process Model and Notation (BPMN) is a widely adopted standard for representing complex business workflows. While BPMN diagrams are often exchanged as visual images, existing methods primarily rely on XML representations for computational analysis. In this work, we present a pipeline that leverages Vision-Language Models (VLMs) to extract structured JSON representations of BPMN diagrams directly from images, without requiring source model files or textual annotations. We also incorporate optical character recognition (OCR) for textual enrichment and evaluate the generated element lists against ground truth data derived from the source XML files. Our approach enables robust component extraction in scenarios where original source files are unavailable. We benchmark multiple VLMs and observe performance improvements in several models when OCR is used for text enrichment. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBusiness Process Modeling and Analysis · Data Visualization and Analytics · Robotic Process Automation Applications
