Privately Fine-Tuned LLMs Preserve Temporal Dynamics in Tabular Data
Lucas Rosenblatt, Peihan Liu, Ryan McKenna, Natalia Ponomareva

TL;DR
This paper introduces PATH, a novel framework using privately fine-tuned large language models to generate synthetic longitudinal tabular data, effectively preserving temporal dynamics and dependencies that traditional methods fail to maintain.
Contribution
The paper presents PATH, a new generative approach that models entire user histories as sequences, capturing temporal dependencies in private synthetic data generation.
Findings
PATH reduces distributional distance to real data by over 60%
PATH decreases state transition errors by nearly 50%
PATH achieves similar marginal fidelity to existing methods
Abstract
Research on differentially private synthetic tabular data has largely focused on independent and identically distributed rows where each record corresponds to a unique individual. This perspective neglects the temporal complexity in longitudinal datasets, such as electronic health records, where a user contributes an entire (sub) table of sequential events. While practitioners might attempt to model such data by flattening user histories into high-dimensional vectors for use with standard marginal-based mechanisms, we demonstrate that this strategy is insufficient. Flattening fails to preserve temporal coherence even when it maintains valid marginal distributions. We introduce PATH, a novel generative framework that treats the full table as the unit of synthesis and leverages the autoregressive capabilities of privately fine-tuned large language models. Extensive evaluations show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Data Quality and Management · Privacy-Preserving Technologies in Data
