Relevant Word Order Vectorization for Improved Natural Language Processing in Electronic Healthcare Records
Jeffrey Thompson, Jinxiang Hu, Dinesh Pal Mudaranthakam, David, Streeter, Lisa Neums, Michele Park, Devin C. Koestler, Byron Gajewski,, Matthew S. Mayo

TL;DR
This paper introduces Relevant Word Order Vectorization (RWOV), a novel text structuring algorithm tailored for electronic health records, improving classification accuracy by considering key word repetition and order in unstructured medical texts.
Contribution
The paper presents RWOV, a new vectorization method specifically designed for EHR free-text, outperforming existing approaches like ngrams and word2vec in classifying breast cancer hormone receptor status.
Findings
RWOV achieved higher F1 scores and AUC than baseline methods.
Considering key word order improves classification performance.
RWOV shows promise for structuring unstructured healthcare text.
Abstract
Objective: Electronic health records (EHR) represent a rich resource for conducting observational studies, supporting clinical trials, and more. However, much of the relevant information is stored in an unstructured format that makes it difficult to use. Natural language processing approaches that attempt to automatically classify the data depend on vectorization algorithms that impose structure on the text, but these algorithms were not designed for the unique characteristics of EHR. Here, we propose a new algorithm for structuring so-called free-text that may help researchers make better use of EHR. We call this method Relevant Word Order Vectorization (RWOV). Materials and Methods: As a proof-of-concept, we attempted to classify the hormone receptor status of breast cancer patients treated at the University of Kansas Medical Center during a recent year, from the unstructured text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Machine Learning in Healthcare
