Learning from aggregated data with a maximum entropy model

Alexandre Gilotte; Ahmed Ben Yahmed; David Rohde

arXiv:2210.02450·cs.LG·October 7, 2022

Learning from aggregated data with a maximum entropy model

Alexandre Gilotte, Ahmed Ben Yahmed, David Rohde

PDF

Open Access 1 Repo

TL;DR

This paper introduces a maximum entropy-based Markov Random Field model that learns from aggregated data, enabling effective machine learning without access to individual data points, and achieves performance comparable to models trained on full data.

Contribution

It proposes a novel approach to learn from aggregated data using a maximum entropy hypothesis, resulting in a Markov Random Field model that performs well in practice.

Findings

01

Model achieves comparable accuracy to full-data logistic regression.

02

Effective training algorithm for MRFs on aggregated data.

03

Empirical validation on multiple public datasets.

Abstract

Aggregating a dataset, then injecting some noise, is a simple and common way to release differentially private data.However, aggregated data -- even without noise -- is not an appropriate input for machine learning classifiers.In this work, we show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis. The resulting model is a Markov Random Field (MRF), and we detail how to apply, modify and scale a MRF training algorithm to our setting. Finally we present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

criteo-research/ad_click_prediction_from_aggregated_data
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Machine Learning and Data Classification · Data Stream Mining Techniques