Pre-training transformer-based framework on large-scale pediatric claims data for downstream population-specific tasks
Xianlong Zeng, Simon Lin, and Chang Liu

TL;DR
This paper introduces Claim-PT, a pre-training framework for pediatric claims data that improves population-specific medical task performance and generalizes well across institutions by leveraging large-scale data and minimal fine-tuning.
Contribution
The study proposes a novel pre-training and fine-tuning approach for pediatric claims data, enhancing model performance on small cohorts and enabling knowledge transfer across healthcare institutions.
Findings
Outperforms task-specific models by over 10% in performance.
Effectively captures medical event semantics during pre-training.
Demonstrates strong transferability across different healthcare institutions.
Abstract
The adoption of electronic health records (EHR) has become universal during the past decade, which has afforded in-depth data-based research. By learning from the large amount of healthcare data, various data-driven models have been built to predict future events for different medical tasks, such as auto diagnosis and heart-attack prediction. Although EHR is abundant, the population that satisfies specific criteria for learning population-specific tasks is scarce, making it challenging to train data-hungry deep learning models. This study presents the Claim Pre-Training (Claim-PT) framework, a generic pre-training model that first trains on the entire pediatric claims dataset, followed by a discriminative fine-tuning on each population-specific task. The semantic meaning of medical events can be captured in the pre-training stage, and the effective knowledge transfer is completed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Electronic Health Records Systems · Topic Modeling
MethodsDiscriminative Fine-Tuning
