Unsupervised patient representations from clinical notes with interpretable classification decisions
Madhumita Sushil, Simon \v{S}uster, Kim Luyckx, Walter Daelemans

TL;DR
This paper introduces unsupervised methods to generate dense, interpretable patient representations from clinical notes using autoencoders and paragraph vectors, and evaluates their effectiveness in supervised tasks.
Contribution
It presents novel unsupervised patient embedding techniques from clinical notes and explores their interpretability and feature significance in classification tasks.
Findings
Dense representations outperform sparse features in supervised tasks
Autoencoder features can be interpreted through input feature significance
Pretrained representations improve classification performance
Abstract
We have two main contributions in this work: 1. We explore the usage of a stacked denoising autoencoder, and a paragraph vector model to learn task-independent dense patient representations directly from clinical notes. We evaluate these representations by using them as features in multiple supervised setups, and compare their performance with those of sparse representations. 2. To understand and interpret the representations, we explore the best encoded features within the patient representations obtained from the autoencoder model. Further, we calculate the significance of the input features of the trained classifiers when we use these pretrained representations as input.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Colorectal Cancer Screening and Detection · Biomedical Text Mining and Ontologies
