Learning from aggregated data with a maximum entropy model
Alexandre Gilotte, Ahmed Ben Yahmed, David Rohde

TL;DR
This paper introduces a maximum entropy-based Markov Random Field model that learns from aggregated data, enabling effective machine learning without access to individual data points, and achieves performance comparable to models trained on full data.
Contribution
It proposes a novel approach to learn from aggregated data using a maximum entropy hypothesis, resulting in a Markov Random Field model that performs well in practice.
Findings
Model achieves comparable accuracy to full-data logistic regression.
Effective training algorithm for MRFs on aggregated data.
Empirical validation on multiple public datasets.
Abstract
Aggregating a dataset, then injecting some noise, is a simple and common way to release differentially private data.However, aggregated data -- even without noise -- is not an appropriate input for machine learning classifiers.In this work, we show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis. The resulting model is a Markov Random Field (MRF), and we detail how to apply, modify and scale a MRF training algorithm to our setting. Finally we present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Machine Learning and Data Classification · Data Stream Mining Techniques
