Privacy-Preserving Machine Learning for Collaborative Data Sharing via Auto-encoder Latent Space Embeddings
Ana Mar\'ia Quintero-Ossa, Jes\'us Solano, Hern\'an Jarc\'ia and, David Zarruk, Alejandro Correa Bahnsen, Carlos Valencia

TL;DR
This paper introduces a framework using autoencoder-based latent space embeddings to enable privacy-preserving data sharing for collaborative machine learning, ensuring data privacy while improving model performance across multiple sources.
Contribution
It proposes a novel autoencoder-based representation learning method that allows organizations to share embedded data without exposing sensitive information.
Findings
Enhanced privacy preservation in data sharing
Improved collaborative model performance
Effective autoencoder-based data embedding
Abstract
Privacy-preserving machine learning in data-sharing processes is an ever-critical task that enables collaborative training of Machine Learning (ML) models without the need to share the original data sources. It is especially relevant when an organization must assure that sensitive data remains private throughout the whole ML pipeline, i.e., training and inference phases. This paper presents an innovative framework that uses Representation Learning via autoencoders to generate privacy-preserving embedded data. Thus, organizations can share the data representation to increase machine learning models' performance in scenarios with more than one data source for a shared predictive downstream task.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Data Quality and Management · Machine Learning in Healthcare
