Generating Privacy-Preserving Process Data with Deep Generative Models
Keyi Li, Sen Yang, Travis M. Sullivan, Randall S. Burd, Ivan Marsic

TL;DR
This paper introduces ProcessGAN, a deep generative model using Transformer networks to produce synthetic process data that preserves privacy and maintains complex process structures, aiding research without exposing sensitive information.
Contribution
The paper presents ProcessGAN, a novel adversarial generative network tailored for process data, outperforming traditional models especially on small, complex datasets.
Findings
ProcessGAN outperforms traditional models on small, complex datasets.
It better captures long-range dependencies in process data.
Synthetic data generated by ProcessGAN is indistinguishable from real data.
Abstract
Process data with confidential information cannot be shared directly in public, which hinders the research in process data mining and analytics. Data encryption methods have been studied to protect the data, but they still may be decrypted, which leads to individual identification. We experimented with different models of representation learning and used the learned model to generate synthetic process data. We introduced an adversarial generative network for process data generation (ProcessGAN) with two Transformer networks for the generator and the discriminator. We evaluated ProcessGAN and traditional models on six real-world datasets, of which two are public and four are collected in medical domains. We used statistical metrics and supervised learning scores to evaluate the synthetic data. We also used process mining to discover workflows for the authentic and synthetic datasets and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBusiness Process Modeling and Analysis · Machine Learning in Healthcare · Privacy-Preserving Technologies in Data
MethodsMulti-Head Attention · Linear Layer · Byte Pair Encoding · Softmax · Dense Connections · Residual Connection · Attention Is All You Need · Dropout · Layer Normalization · Adam
