Federated Timeline Synthesis: Scalable and Private Methodology For Model Training and Deployment
Pawel Renc, Michal K. Grzeszczyk, Linglong Qian, Nassim Oufattole, Jeff Rasley, Arkadiusz Sitek

TL;DR
The paper introduces Federated Timeline Synthesis (FTS), a privacy-preserving, scalable framework for training generative models on distributed healthcare time series data, enabling effective clinical predictions and simulations without sharing sensitive patient information.
Contribution
FTS is a novel federated learning approach that encodes patient histories as tokenized timelines, allowing decentralized training and synthesis of large clinical datasets for healthcare applications.
Findings
Models trained on synthetic data perform comparably to real data models.
FTS provides strong privacy guarantees and scalability across institutions.
Enables diverse healthcare prediction and simulation tasks.
Abstract
We present Federated Timeline Synthesis (FTS), a novel framework for training generative foundation models across distributed timeseries data applied to electronic health records (EHR). At its core, FTS represents patient history as tokenized Patient Health Timelines (PHTs), language-agnostic sequences encoding temporal, categorical, and continuous clinical information. Each institution trains an autoregressive transformer on its local PHTs and transmits only model weights to a central server. The server uses the generators to synthesize a large corpus of trajectories and train a Global Generator (GG), enabling zero-shot inference via Monte Carlo simulation of future PHTs. We evaluate FTS on five clinically meaningful prediction tasks using MIMIC-IV data, showing that models trained on synthetic data generated by GG perform comparably to those trained on real data. FTS offers strong…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
- The paper tackles a critical and high-impact problem: enabling collaborative, large-scale model training for healthcare while respecting the severe privacy and data-siloing constraints of the field. - The proposed two-stage synthesis framework (local generators -> central synthetic corpus -> global generator) is an interesting alternative to traditional federated learning
- The paper repeatedly claims "strong privacy guarantees" in the abstract , introduction , and discussion, positioning this as a primary benefit. However, the authors explicitly contradict this in their contributions, stating, "we do not provide formal privacy guarantees" , and again in the limitations: "it does not guarantee protection against potential attacks such as membership inference or model inversion". Sending trained generator weights is not formally private; these weights can contain
- The approach could meaningfully reduce privacy risks and regulatory barriers in federated clinical modeling, which is a major obstacle in real-world EHR applications. - Evaluation across five clinically meaningful tasks (DRG, SOFA, readmission, ICU admission, mortality) demonstrates reasonable robustness. The inclusion of calibration and fidelity metrics (e.g., Unigram and DimWise R²) strengthens the empirical credibility.
- The paper does not compare with any other federated learning method as baselines. It's unclear whether the proposed method is optimal. - The experiment is only conducted on one dataset, MIMIC. This limits generalizability to more heterogeneous or real-world multi-institutional setups. The “federated” simulation is synthetic rather than operationally federated and from different institutes. - The experiment settings are not comprehensive - the impact of number of clients and different amount
* Frame FL of time series generators as **federated synthesis** by sharing compact trained synthesizers instead of gradient checkpoints or synthesized data, distinct from prior works * Provides a clear pipeline, reasonable ablations (temperature, data regimes), and fidelity checks (unigram/DimWise) that support fidelity preservation. * Lowered communication cost of aggregating synthetic data across silos.
* Privacy evidence: No formal privacy guarantees or empirical audits are presented across the paper and appendix, despite privacy being the central motivation. Note that is well acknowledged that even sharing trained generator model produces privacy risks due to model inversion attack / data reconstruction attack / membership inference attack[1]. * Prior art contrast: While the paper recognized prior work on federated learning for synthesizers[2] / aggregating synthetic data rather than gradien
1. The paper combines FL with transformer models by training local transformer generators and aggregating for synthetic data generation. 2. This work proposes a multi-model integration that combines doctor notes, structured codes, images, measurements, and genomics. 3. It proposes a Zero-Shot Probabilistic Inference approach to generate a sequential synethic data. It claims this method can generate a sequence in an irreguarly-sampled way.
1. Unclear motivation and objective: It is difficult to discern whether the main contribution is FLframework with transformers or timeline synthesis as a generative objective. 2. Unclear core concept: The term “timeline synthesis” could be clarified. This paper does not explicility define it or properly cites it in previous works. It seems like it is a complete new concept but it lacks clear definition. 3. Inadequate dataset: This work essentailly proposes a framework which is working on cro
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Privacy-Preserving Technologies in Data · Electronic Health Records Systems
