Generating Privacy-Preserving Process Data with Deep Generative Models

Keyi Li; Sen Yang; Travis M. Sullivan; Randall S. Burd; Ivan Marsic

arXiv:2203.07949·cs.LG·March 16, 2022

Generating Privacy-Preserving Process Data with Deep Generative Models

Keyi Li, Sen Yang, Travis M. Sullivan, Randall S. Burd, Ivan Marsic

PDF

Open Access

TL;DR

This paper introduces ProcessGAN, a deep generative model using Transformer networks to produce synthetic process data that preserves privacy and maintains complex process structures, aiding research without exposing sensitive information.

Contribution

The paper presents ProcessGAN, a novel adversarial generative network tailored for process data, outperforming traditional models especially on small, complex datasets.

Findings

01

ProcessGAN outperforms traditional models on small, complex datasets.

02

It better captures long-range dependencies in process data.

03

Synthetic data generated by ProcessGAN is indistinguishable from real data.

Abstract

Process data with confidential information cannot be shared directly in public, which hinders the research in process data mining and analytics. Data encryption methods have been studied to protect the data, but they still may be decrypted, which leads to individual identification. We experimented with different models of representation learning and used the learned model to generate synthetic process data. We introduced an adversarial generative network for process data generation (ProcessGAN) with two Transformer networks for the generator and the discriminator. We evaluated ProcessGAN and traditional models on six real-world datasets, of which two are public and four are collected in medical domains. We used statistical metrics and supervised learning scores to evaluate the synthetic data. We also used process mining to discover workflows for the authentic and synthetic datasets and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBusiness Process Modeling and Analysis · Machine Learning in Healthcare · Privacy-Preserving Technologies in Data

MethodsMulti-Head Attention · Linear Layer · Byte Pair Encoding · Softmax · Dense Connections · Residual Connection · Attention Is All You Need · Dropout · Layer Normalization · Adam