DreamNLP: Novel NLP System for Clinical Report Metadata Extraction using Count Sketch Data Streaming Algorithm: Preliminary Results
Sanghyun Choi, Nikita Ivkin, Vladimir Braverman, Michael A. Jacobs

TL;DR
DreamNLP introduces a memory-efficient data streaming algorithm to extract key clinical report terms from large EHR datasets, aiding downstream medical analysis and machine learning applications.
Contribution
It presents a novel application of the Count Sketch algorithm for efficient term extraction in clinical reports, reducing computational resources needed.
Findings
Successfully identified important breast diagnosis features from EHRs
Demonstrated low-memory, scalable term extraction method
Potential to enhance machine learning in precision medicine
Abstract
Extracting information from electronic health records (EHR) is a challenging task since it requires prior knowledge of the reports and some natural language processing algorithm (NLP). With the growing number of EHR implementations, such knowledge is increasingly challenging to obtain in an efficient manner. We address this challenge by proposing a novel methodology to analyze large sets of EHRs using a modified Count Sketch data streaming algorithm termed DreamNLP. By using DreamNLP, we generate a dictionary of frequently occurring terms or heavy hitters in the EHRs using low computational memory compared to conventional counting approach other NLP programs use. We demonstrate the extraction of the most important breast diagnosis features from the EHRs in a set of patients that underwent breast imaging. Based on the analysis, extraction of these terms would be useful for defining…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Semantic Web and Ontologies · Machine Learning in Healthcare
