Self-supervision for health insurance claims data: a Covid-19 use case
Emilia Apostolova, Fazle Karim, Guido Muscioni, Anubhav Rana, Jeffrey, Clyman

TL;DR
This paper adapts self-supervised learning to health insurance claims data, demonstrating improved Covid-19 hospitalization prediction accuracy and increased model trustworthiness by pre-training on claims history.
Contribution
It introduces a novel self-supervised pre-training approach for medical claims data, enhancing predictive performance and model reliability in healthcare applications.
Findings
Pre-training on claims data improves Covid-19 hospitalization prediction.
Pre-training enhances model trustworthiness and stability.
Method outperforms baseline models in accuracy.
Abstract
In this work, we modify and apply self-supervision techniques to the domain of medical health insurance claims. We model patients' healthcare claims history analogous to free-text narratives, and introduce pre-trained `prior knowledge', later utilized for patient outcome predictions on a challenging task: predicting Covid-19 hospitalization, given a patient's pre-Covid-19 insurance claims history. Results suggest that pre-training on insurance claims not only produces better prediction performance, but, more importantly, improves the model's `clinical trustworthiness' and model stability/reliability.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Artificial Intelligence in Healthcare · Big Data Technologies and Applications
