# Process for Quality Management of Electronic Medical Records–Based Data: Case Study Using Real Colorectal Cancer Data

**Authors:** NaYoung Park, Kyungmin Na, Woongsang Sunwoo, Jeong-Heum Baek, Youngho Lee, Suehyun Lee, Hyekyung Woo

PMC · DOI: 10.2196/73884 · 2025-11-13

## TL;DR

This paper presents a quality management process for electronic medical records, improving data quality in colorectal cancer research by reducing missing data and enhancing model accuracy.

## Contribution

A rules-based quality management process for real-world clinical data, specifically applied to colorectal cancer data in Korea.

## Key findings

- The QMP reduced missing data for TNM staging from 75.3% to 35.7%.
- TNM stage and detailed code variables became important in the improved predictive model.
- The process is applicable to real-world datasets, showing potential for broader clinical use.

## Abstract

As data-driven medical research advances, vast amounts of medical data are being collected, giving researchers access to important information. However, issues such as heterogeneity, complexity, and incompleteness of datasets limit their practical use. Errors and missing data negatively affect artificial intelligence–based predictive models, undermining the reliability of clinical decision-making. Thus, it is important to develop a quality management process (QMP) for clinical data.

This study aimed to develop a rules-based QMP to address errors and impute missing values in real-world data, establishing high-quality data for clinical research.

We used clinical data from 6491 patients with colorectal cancer (CRC) collected at Gachon University Gil Medical Center between 2010 and 2022, leveraging the clinical library established within the Korea Clinical Data Use Network for Research Excellence. First, we conducted a literature review on the prognostic prediction of CRC to assess whether the data met our research purposes, comparing selected variables with real-world data. A labeling process was then implemented to extract key variables, which facilitated the creation of an automatic staging library. This library, combined with a rule-based process, allowed for systematic analysis and evaluation.

Theoretically, the tumor, node, metastasis (TNM) stage was identified as an important prognostic factor for CRC, but it was not selected through feature selection in real-world data. After applying the QMP, rates of missing data were reduced from 75.3% to 35.7% for TNM and from 24.3% to 18.5% for surveillance, epidemiology, and end results across 6491 cases, confirming the system’s effectiveness. Variable importance analysis through feature selection revealed that TNM stage and detailed code variables, which were previously unselected, were included in the improved model.

In sum, we developed a rules-based QMP to address errors and impute missing values in Korea Clinical Data Use Network for Research Excellence data, enhancing data quality. The applicability of the process to real-world datasets highlights its potential for broader use in clinical studies and cancer research.

## Linked entities

- **Diseases:** colorectal cancer (MONDO:0005575)

## Full-text entities

- **Diseases:** CRC (MESH:D015179), cancer (MESH:D009369)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12614659/full.md

---
Source: https://tomesphere.com/paper/PMC12614659