Scaling While Privacy Preserving: A Comprehensive Synthetic Tabular Data Generation and Evaluation in Learning Analytics
Qinyi Liu, Mohammad Khalil, Ronas Shakya, and Jelena Jovanovic

TL;DR
This paper evaluates synthetic tabular data generation methods for learning analytics, demonstrating that they can balance privacy and utility effectively across different scenarios, and providing guidelines for their application.
Contribution
It offers a comprehensive evaluation framework for synthetic data in learning analytics, addressing privacy, utility, and resemblance, and provides tailored recommendations for different LA scenarios.
Findings
Synthetic data maintains similar predictive utility as real data.
Synthetic data can preserve privacy effectively.
Evaluation across three datasets demonstrates practical utility.
Abstract
Privacy poses a significant obstacle to the progress of learning analytics (LA), presenting challenges like inadequate anonymization and data misuse that current solutions struggle to address. Synthetic data emerges as a potential remedy, offering robust privacy protection. However, prior LA research on synthetic data lacks thorough evaluation, essential for assessing the delicate balance between privacy and data utility. Synthetic data must not only enhance privacy but also remain practical for data analytics. Moreover, diverse LA scenarios come with varying privacy and utility needs, making the selection of an appropriate synthetic data approach a pressing challenge. To address these gaps, we propose a comprehensive evaluation of synthetic data, which encompasses three dimensions of synthetic data quality, namely resemblance, utility, and privacy. We apply this evaluation to three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Traffic Prediction and Management Techniques · Data Stream Mining Techniques
