Privacy-Preserving Machine Learning for Collaborative Data Sharing via   Auto-encoder Latent Space Embeddings

Ana Mar\'ia Quintero-Ossa; Jes\'us Solano; Hern\'an Jarc\'ia and; David Zarruk; Alejandro Correa Bahnsen; Carlos Valencia

arXiv:2211.05717·cs.LG·November 14, 2022

Privacy-Preserving Machine Learning for Collaborative Data Sharing via Auto-encoder Latent Space Embeddings

Ana Mar\'ia Quintero-Ossa, Jes\'us Solano, Hern\'an Jarc\'ia and, David Zarruk, Alejandro Correa Bahnsen, Carlos Valencia

PDF

Open Access

TL;DR

This paper introduces a framework using autoencoder-based latent space embeddings to enable privacy-preserving data sharing for collaborative machine learning, ensuring data privacy while improving model performance across multiple sources.

Contribution

It proposes a novel autoencoder-based representation learning method that allows organizations to share embedded data without exposing sensitive information.

Findings

01

Enhanced privacy preservation in data sharing

02

Improved collaborative model performance

03

Effective autoencoder-based data embedding

Abstract

Privacy-preserving machine learning in data-sharing processes is an ever-critical task that enables collaborative training of Machine Learning (ML) models without the need to share the original data sources. It is especially relevant when an organization must assure that sensitive data remains private throughout the whole ML pipeline, i.e., training and inference phases. This paper presents an innovative framework that uses Representation Learning via autoencoders to generate privacy-preserving embedded data. Thus, organizations can share the data representation to increase machine learning models' performance in scenarios with more than one data source for a shared predictive downstream task.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Data Quality and Management · Machine Learning in Healthcare