TL;DR
This paper introduces a privacy-preserving generative model for creating synthetic educational data, enabling research while protecting participant privacy, and provides an evaluation framework for comparing such models.
Contribution
It presents a novel generative model for educational data that ensures privacy and an evaluation framework to assess synthetic data quality and privacy guarantees.
Findings
Naive pseudonymization can lead to re-identification risks.
Proposed techniques effectively preserve privacy in synthetic data.
Evaluations show the method's utility on large educational datasets.
Abstract
Institutions collect massive learning traces but they may not disclose it for privacy issues. Synthetic data generation opens new opportunities for research in education. In this paper we present a generative model for educational data that can preserve the privacy of participants, and an evaluation framework for comparing synthetic data generators. We show how naive pseudonymization can lead to re-identification threats and suggest techniques to guarantee privacy. We evaluate our method on existing massive educational open datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
