Generalized and Transferable Patient Language Representation for Phenotyping with Limited Data
Yuqi Si, Elmer V Bernstam, Kirk Roberts

TL;DR
This paper introduces a multi-task transfer learning approach to create generalized patient language representations that improve phenotyping, especially for low-prevalence diseases with limited data, demonstrating robustness and high performance.
Contribution
The study proposes a novel multi-task pre-training and fine-tuning method that enhances transferability and robustness of patient language representations across diverse phenotypes.
Findings
Multi-task pre-training improves learning efficiency.
Models achieve high performance on low-prevalence phenotypes.
Pre-trained models are robust across various disease categories.
Abstract
The paradigm of representation learning through transfer learning has the potential to greatly enhance clinical natural language processing. In this work, we propose a multi-task pre-training and fine-tuning approach for learning generalized and transferable patient representations from medical language. The model is first pre-trained with different but related high-prevalence phenotypes and further fine-tuned on downstream target tasks. Our main contribution focuses on the impact this technique can have on low-prevalence phenotypes, a challenging task due to the dearth of data. We validate the representation from pre-training, and fine-tune the multi-task pre-trained models on low-prevalence phenotypes including 38 circulatory diseases, 23 respiratory diseases, and 17 genitourinary diseases. We find multi-task pre-training increases learning efficiency and achieves consistently high…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
