Deep Contextual Clinical Prediction with Reverse Distillation
Rohan S. Kodialam, Rebecca Boiarsky, Justin Lim, Neil Dixit, Aditya, Sai, David Sontag

TL;DR
This paper introduces Reverse Distillation, a novel pretraining technique for deep models in clinical prediction, leveraging high-performing linear models to improve deep learning performance on insurance claims data.
Contribution
The paper proposes Reverse Distillation and the SARD architecture, which significantly enhance deep model performance in clinical outcome prediction tasks.
Findings
SARD outperforms state-of-the-art methods on multiple clinical prediction tasks.
Reverse distillation is identified as a key factor in performance improvements.
The approach leverages longitudinal insurance claims data effectively.
Abstract
Healthcare providers are increasingly using machine learning to predict patient outcomes to make meaningful interventions. However, despite innovations in this area, deep learning models often struggle to match performance of shallow linear models in predicting these outcomes, making it difficult to leverage such techniques in practice. In this work, motivated by the task of clinical prediction from insurance claims, we present a new technique called Reverse Distillation which pretrains deep models by using high-performing linear models for initialization. We make use of the longitudinal structure of insurance claims datasets to develop Self Attention with Reverse Distillation, or SARD, an architecture that utilizes a combination of contextual embedding, temporal embedding and self-attention mechanisms and most critically is trained via reverse distillation. SARD outperforms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning in Healthcare · COVID-19 diagnosis using AI · AI in cancer detection
