# Data profile: cancer sample cohorts (stomach, breast, colorectal, and liver) in Korea

**Authors:** Daewoo Pak, Suk Yong Jang, Jin-Ha Yoon, Dong Wook Kim, Jin-Won Noh, Dong-Woo Choi, Minyeong Guk, Hyeri Kim, Ju-Won Oh, Heejung Chae, Hyun-Joo Kong, Gi Hyun Kim, Ji Woong Nam, Ga Ram Lee, Dayun Park, Jehoo Jeon, Byungyoon Yun, Ki-Bong Yoo, Kui Son Choi

PMC · DOI: 10.4178/epih.e2025058 · Epidemiology and Health · 2025-10-14

## TL;DR

This paper introduces a new Korean cancer sample cohort dataset from four cancer types, combining multiple public health data sources for research.

## Contribution

The paper presents a new nationally representative sample cohort dataset for four cancers in Korea, integrating diverse public health data sources.

## Key findings

- The dataset includes approximately 21% of all cancer patients in Korea from 2012 to 2019.
- It covers stomach, breast, colorectal, and liver cancers with 51,951 to 39,586 patients per cohort.
- The data includes demographics, health utilization, cancer screening, and mortality information.

## Abstract

Cancer Public Library Database (CPLD) links data from four major population-based public sources: the Korea National Cancer Incidence Database in the Korea Central Cancer Registry, cause-of-death data in Statistics Korea, the National Health Information Database in the National Health Insurance Service, and the National Health Insurance Research Database in the Health Insurance Review & Assessment Service. The National Cancer Data Center has developed a new nationally representative sample cohort dataset from Korean Clinical Data Utilization for Research Excellence project (K-CURE) CPLD: Stomach Cancer Sample Cohort, Breast Cancer Sample Cohort, Colorectal Cancer Sample Cohort, and Liver Cancer Sample Cohort. The sample populations consisted of approximately 21% of all cancer patients from 2012 to 2019. The populations of the Stomach Cancer Sample Cohort, Breast Cancer Sample Cohort, Colorectal Cancer Sample Cohort, and Liver Cancer Sample Cohort were 51,951, 39,586, 53,485, and 27,375 patients, respectively. The dataset included cancer incidence information, demographics, socioeconomic variables, health utilization data (procedures, diagnoses, and medications), general health checkup data, cancer screening data before and after the cancer incidence, as well as death information. These cohorts could help researchers analyze time-to-event data on mortality, treatment outcomes, comorbid conditions following a cancer diagnosis, and cancer incidence risk factors. The data can be accessed through the K-CURE portal (https://k-cure.mohw.go.kr/).

## Linked entities

- **Diseases:** stomach cancer (MONDO:0001056), breast cancer (MONDO:0004989), colorectal cancer (MONDO:0005575), liver cancer (MONDO:0002691)

## Full-text entities

- **Diseases:** Stomach Cancer (MESH:D013274), Colorectal Cancer (MESH:D015179), death (MESH:D003643), Cancer (MESH:D009369), Breast Cancer (MESH:D001943), Liver Cancer (MESH:D006528)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12869124/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12869124/full.md

## References

31 references — full list in the complete paper: https://tomesphere.com/paper/PMC12869124/full.md

---
Source: https://tomesphere.com/paper/PMC12869124