Pre-training transformer-based framework on large-scale pediatric claims   data for downstream population-specific tasks

Xianlong Zeng; Simon Lin; and Chang Liu

arXiv:2106.13095·cs.LG·June 25, 2021·1 cites

Pre-training transformer-based framework on large-scale pediatric claims data for downstream population-specific tasks

Xianlong Zeng, Simon Lin, and Chang Liu

PDF

Open Access

TL;DR

This paper introduces Claim-PT, a pre-training framework for pediatric claims data that improves population-specific medical task performance and generalizes well across institutions by leveraging large-scale data and minimal fine-tuning.

Contribution

The study proposes a novel pre-training and fine-tuning approach for pediatric claims data, enhancing model performance on small cohorts and enabling knowledge transfer across healthcare institutions.

Findings

01

Outperforms task-specific models by over 10% in performance.

02

Effectively captures medical event semantics during pre-training.

03

Demonstrates strong transferability across different healthcare institutions.

Abstract

The adoption of electronic health records (EHR) has become universal during the past decade, which has afforded in-depth data-based research. By learning from the large amount of healthcare data, various data-driven models have been built to predict future events for different medical tasks, such as auto diagnosis and heart-attack prediction. Although EHR is abundant, the population that satisfies specific criteria for learning population-specific tasks is scarce, making it challenging to train data-hungry deep learning models. This study presents the Claim Pre-Training (Claim-PT) framework, a generic pre-training model that first trains on the entire pediatric claims dataset, followed by a discriminative fine-tuning on each population-specific task. The semantic meaning of medical events can be captured in the pre-training stage, and the effective knowledge transfer is completed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Electronic Health Records Systems · Topic Modeling

MethodsDiscriminative Fine-Tuning