Synergizing Data Imputation and Electronic Health Records for Advancing Prostate Cancer Research: Challenges, and Practical Applications
Abderrahim Oussama Batouche, Eugen Czeizler, Miika Koskinen, Tuomas, Mirtti, Antti Sakari Rannikko

TL;DR
This paper develops a novel pipeline to extract and integrate structured and unstructured EHR data for prostate cancer, overcoming challenges of data ambiguity and missing entries to facilitate advanced research.
Contribution
It introduces a new method combining NLP and data validation to improve data quality and completeness in prostate cancer EHR datasets.
Findings
Enhanced data extraction accuracy for prostate cancer EHRs
Improved completeness of structured and unstructured data integration
Potential for advancing prostate cancer research using enriched datasets
Abstract
The presence of detailed clinical information in electronic health record (EHR) systems presents promising prospects for enhancing patient care through automated retrieval techniques. Nevertheless, it is widely acknowledged that accessing data within EHRs is hindered by various methodological challenges. Specifically, the clinical notes stored in EHRs are composed in a narrative form, making them prone to ambiguous formulations and highly unstructured data presentations, while structured reports commonly suffer from missing and/or erroneous data entries. This inherent complexity poses significant challenges when attempting automated large-scale medical knowledge extraction tasks, necessitating the application of advanced tools, such as natural language processing (NLP), as well as data audit techniques. This work aims to address these obstacles by creating and validating a novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Machine Learning in Healthcare · Topic Modeling
