Process for Quality Management of Electronic Medical Records–Based Data: Case Study Using Real Colorectal Cancer Data
NaYoung Park, Kyungmin Na, Woongsang Sunwoo, Jeong-Heum Baek, Youngho Lee, Suehyun Lee, Hyekyung Woo

TL;DR
This paper presents a quality management process for electronic medical records, improving data quality in colorectal cancer research by reducing missing data and enhancing model accuracy.
Contribution
A rules-based quality management process for real-world clinical data, specifically applied to colorectal cancer data in Korea.
Findings
The QMP reduced missing data for TNM staging from 75.3% to 35.7%.
TNM stage and detailed code variables became important in the improved predictive model.
The process is applicable to real-world datasets, showing potential for broader clinical use.
Abstract
As data-driven medical research advances, vast amounts of medical data are being collected, giving researchers access to important information. However, issues such as heterogeneity, complexity, and incompleteness of datasets limit their practical use. Errors and missing data negatively affect artificial intelligence–based predictive models, undermining the reliability of clinical decision-making. Thus, it is important to develop a quality management process (QMP) for clinical data. This study aimed to develop a rules-based QMP to address errors and impute missing values in real-world data, establishing high-quality data for clinical research. We used clinical data from 6491 patients with colorectal cancer (CRC) collected at Gachon University Gil Medical Center between 2010 and 2022, leveraging the clinical library established within the Korea Clinical Data Use Network for Research…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Data Quality and Management · Biomedical Text Mining and Ontologies
