Towards Privacy-Preserving Relational Data Synthesis via Probabilistic   Relational Models

Malte Luttermann; Ralf M\"oller; Mattis Hartwig

arXiv:2409.04194·cs.AI·October 3, 2024

Towards Privacy-Preserving Relational Data Synthesis via Probabilistic Relational Models

Malte Luttermann, Ralf M\"oller, Mattis Hartwig

PDF

Open Access

TL;DR

This paper presents a pipeline for generating synthetic relational data using probabilistic relational models, addressing privacy concerns and data scarcity in machine learning tasks.

Contribution

It introduces a novel pipeline and learning algorithm to construct probabilistic relational models from relational databases for synthetic data generation.

Findings

01

Effective pipeline from database to probabilistic model

02

Successful sampling of synthetic relational data

03

Addresses privacy and data scarcity issues

Abstract

Probabilistic relational models provide a well-established formalism to combine first-order logic and probabilistic models, thereby allowing to represent relationships between objects in a relational domain. At the same time, the field of artificial intelligence requires increasingly large amounts of relational training data for various machine learning tasks. Collecting real-world data, however, is often challenging due to privacy concerns, data protection regulations, high costs, and so on. To mitigate these challenges, the generation of synthetic data is a promising approach. In this paper, we solve the problem of generating synthetic relational data via probabilistic relational models. In particular, we propose a fully-fledged pipeline to go from relational database to probabilistic relational model, which can then be used to sample new synthetic relational data points from its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Data Quality and Management · Cryptography and Data Security