Data augmentation method for modeling health records with applications   to clopidogrel treatment failure detection

Sunwoong Choi; Samuel Kim

arXiv:2402.18046·cs.LG·February 29, 2024·2 cites

Data augmentation method for modeling health records with applications to clopidogrel treatment failure detection

Sunwoong Choi, Samuel Kim

PDF

Open Access

TL;DR

This paper introduces a novel data augmentation technique for electronic health records that improves NLP-based modeling of patient data, especially in low-data scenarios, demonstrated through clopidogrel treatment failure detection.

Contribution

The paper proposes a new data augmentation method that rearranges medical record order within visits to enhance NLP modeling of health records.

Findings

01

Up to 5.3% absolute ROC-AUC improvement with augmentation

02

Augmentation benefits are greater with limited labeled data

03

Method enhances pre-training and fine-tuning performance

Abstract

We present a novel data augmentation method to address the challenge of data scarcity in modeling longitudinal patterns in Electronic Health Records (EHR) of patients using natural language processing (NLP) algorithms. The proposed method generates augmented data by rearranging the orders of medical records within a visit where the order of elements are not obvious, if any. Applying the proposed method to the clopidogrel treatment failure detection task enabled up to 5.3% absolute improvement in terms of ROC-AUC (from 0.908 without augmentation to 0.961 with augmentation) when it was used during the pre-training procedure. It was also shown that the augmentation helped to improve performance during fine-tuning procedures, especially when the amount of labeled training data is limited.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Diabetes Management and Research · Artificial Intelligence in Healthcare