Textual Data Augmentation for Patient Outcomes Prediction
Qiuhao Lu, Dejing Dou, Thien Huu Nguyen

TL;DR
This paper introduces a novel method for augmenting clinical text data using GPT-2 to improve patient outcomes prediction, addressing data scarcity in healthcare applications.
Contribution
It proposes a teacher-student framework with GPT-2 for generating labeled synthetic clinical notes to enhance predictive modeling.
Findings
Augmented data improves prediction accuracy for 30-day readmission.
The method effectively increases training data quality and quantity.
Deep models benefit from the augmented textual data.
Abstract
Deep learning models have demonstrated superior performance in various healthcare applications. However, the major limitation of these deep models is usually the lack of high-quality training data due to the private and sensitive nature of this field. In this study, we propose a novel textual data augmentation method to generate artificial clinical notes in patients' Electronic Health Records (EHRs) that can be used as additional training data for patient outcomes prediction. Essentially, we fine-tune the generative language model GPT-2 to synthesize labeled text with the original training data. More specifically, We propose a teacher-student framework where we first pre-train a teacher model on the original data, and then train a student model on the GPT-augmented data under the guidance of the teacher. We evaluate our method on the most common patient outcome, i.e., the 30-day…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Cosine Annealing · Residual Connection · Byte Pair Encoding · Dropout · Attention Dropout · Linear Warmup With Cosine Annealing · Dense Connections
