PET: An Annotated Dataset for Process Extraction from Natural Language Text
Patrizio Bellan, Han van der Aa, Mauro Dragoni, Chiara Ghidini, Simone Paolo Ponzetto

TL;DR
The paper introduces PET, a comprehensive annotated dataset of business process descriptions aimed at advancing process extraction research by providing a standard benchmark for evaluating extraction methods.
Contribution
It presents the first annotated corpus for process extraction, enabling objective comparison and data-driven approaches in natural language processing of business texts.
Findings
PET dataset includes annotations for activities, gateways, actors, and flows.
Baseline experiments demonstrate the dataset's utility and the challenges in process extraction.
The dataset is publicly accessible for research and benchmarking.
Abstract
Process extraction from text is an important task of process discovery, for which various approaches have been developed in recent years. However, in contrast to other information extraction tasks, there is a lack of gold-standard corpora of business process descriptions that are carefully annotated with all the entities and relationships of interest. Due to this, it is currently hard to compare the results obtained by extraction approaches in an objective manner, whereas the lack of annotated texts also prevents the application of data-driven information extraction methodologies, typical of the natural language processing field. Therefore, to bridge this gap, we present the PET dataset, a first corpus of business process descriptions annotated with activities, gateways, actors, and flow information. We present our new resource, including a variety of baselines to benchmark the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBusiness Process Modeling and Analysis · Service-Oriented Architecture and Web Services · Semantic Web and Ontologies
