Transferring Knowledge from Text to Predict Disease Onset
Yun Liu, Kun-Ta Chuang, Fu-Wen Liang, Huey-Jen Su, Collin M. Stultz,, John V. Guttag

TL;DR
This paper introduces a method that leverages domain-specific word embeddings to rescale features based on their relevance, improving disease onset prediction accuracy and interpretability, especially with limited positive examples.
Contribution
The novel approach uses word2vec relevance estimates to rescale features, enhancing model accuracy and reducing feature count in disease prediction tasks.
Findings
Improved prediction accuracy with fewer positive examples
Reduced feature set by 60%, aiding interpretability
Applicable to other domains with feature and outcome descriptions
Abstract
In many domains such as medicine, training data is in short supply. In such cases, external knowledge is often helpful in building predictive models. We propose a novel method to incorporate publicly available domain expertise to build accurate models. Specifically, we use word2vec models trained on a domain-specific corpus to estimate the relevance of each feature's text description to the prediction problem. We use these relevance estimates to rescale the features, causing more important features to experience weaker regularization. We apply our method to predict the onset of five chronic diseases in the next five years in two genders and two age groups. Our rescaling approach improves the accuracy of the model, particularly when there are few positive examples. Furthermore, our method selects 60% fewer features, easing interpretation by physicians. Our method is applicable to other…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
