Masked Clinical Modelling: A Framework for Synthetic and Augmented   Survival Data Generation

Nicholas I-Hsien Kuo; Blanca Gallego; Louisa Jorm

arXiv:2410.16811·cs.LG·October 24, 2024

Masked Clinical Modelling: A Framework for Synthetic and Augmented Survival Data Generation

Nicholas I-Hsien Kuo, Blanca Gallego, Louisa Jorm

PDF

Open Access

TL;DR

This paper introduces Masked Clinical Modelling, a novel framework inspired by masked language models, for generating synthetic and augmented survival data that maintains clinical utility and improves analysis performance.

Contribution

The paper presents a new framework for synthetic survival data generation that emphasizes data utility and clinical relevance, outperforming existing methods.

Findings

01

Improves discrimination and calibration in survival analysis

02

Preserves key clinical metrics like hazard ratios

03

Outperforms existing data synthesis methods

Abstract

Access to real clinical data is often restricted due to privacy obligations, creating significant barriers for healthcare research. Synthetic datasets provide a promising solution, enabling secure data sharing and model development. However, most existing approaches focus on data realism rather than utility -- ensuring that models trained on synthetic data yield clinically meaningful insights comparable to those trained on real data. In this paper, we present Masked Clinical Modelling (MCM), a framework inspired by masked language modelling, designed for both data synthesis and conditional data augmentation. We evaluate this prototype on the WHAS500 dataset using Cox Proportional Hazards models, focusing on the preservation of hazard ratios as key clinical metrics. Our results show that data generated using the MCM framework improves both discrimination and calibration in survival…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare

MethodsFocus