Training-Free Private Synthesis with Validation: A New Frontier for Practical Educational Data Sharing
Hibiki Ito, Chia-Yu Hsu, Hiroaki Ogata

TL;DR
This paper introduces a practical, training-free LLM-based differential privacy synthetic data generation method for educational data sharing, enabling easier implementation and validation with moderate privacy risks.
Contribution
It proposes a novel two-stage approach combining training-free LLM-based DP-SDG with on-demand validation, reducing engineering effort for educational data sharing.
Findings
LLM-based DP-SDG performs comparably to deep learning baselines.
The method significantly reduces engineering costs.
Moderate privacy leakage occurs during validation.
Abstract
While secondary use of real-world data (RWD) in education offers substantial research opportunities, data sharing is often limited by privacy constraints. Differentially private synthetic data generation (DP-SDG) has emerged as a possible solution. However, educational RWD is fragmented across platforms and institutions and stored in different formats, so DP-SDG must be tailored to each dataset, requiring substantial engineering effort. In addition, such data are often small-sample and high-dimensional, making deep learning (DL)-based methods common but difficult to implement without specialist expertise. In this setting, it is also hard to achieve practically useful downstream utility. As a result, despite its theoretical promise, DP-SDG remains far from a practical solution in education. To address this issue, we propose a more practical two-stage method: (1) training-free, LLM-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
