# Clinical Laboratory Parameter–Driven Machine Learning for Participant Selection in Bioequivalence Studies Among Patients With Gastric Cancer: Framework Development and Validation Study

**Authors:** Byungeun Shon, Sook Jin Seong, Eun Jung Choi, Mi-Ri Gwon, Hae Won Lee, Jaechan Park, Ho-Young Chung, Sungmoon Jeong, Young-Ran Yoon

PMC · DOI: 10.2196/64845 · JMIR AI · 2025-05-05

## TL;DR

This study developed a machine learning framework using lab data to efficiently select eligible patients for bioequivalence trials in gastric cancer.

## Contribution

A novel ML framework using clinical lab parameters to improve participant selection in bioequivalence studies.

## Key findings

- The ML model achieved an F1-score above 0.8 and AUC exceeding 0.8, showing strong performance in identifying valid candidates.
- The model reduced workload by 57% in a case study by identifying 150 valid patients from 209, compared to 485 with random selection.
- High sensitivity of the model improved efficiency in prioritizing patients for screening.

## Abstract

Insufficient participant enrollment is a major factor responsible for clinical trial failure.

We formulated a machine learning (ML)–based framework using clinical laboratory parameters to identify participants eligible for enrollment in a bioequivalence study.

We acquired records of 11,592 patients with gastric cancer from the electronic medical records of Kyungpook National University Hospital in Korea. The ML model was developed using 8 clinical laboratory parameters, including complete blood count and liver and kidney function tests, along with the dates of acquisition. Two datasets were collected: (1) a training dataset to design an ML-based candidate selection method and (2) a test dataset to evaluate the performance of the proposed method. The generalization performance of the ML-based method was confirmed using the F1-score and the area under the curve (AUC). The proposed model was compared with a random selection method to evaluate its efficacy in recruiting participants.

The weighted ensemble model achieved strong performance with an F1-score above 0.8 and an AUC value exceeding 0.8, demonstrating its ability to accurately identify valid clinical trial candidates while minimizing misclassification. Its high sensitivity further enhanced the model’s efficiency in prioritizing patients for screening. In a case study, the proposed ML model reduced the workload by 57%, efficiently identifying 150 valid patients from a pool of 209, compared to the 485 patients required by random selection.

The proposed ML-based framework using clinical laboratory parameters can be used to identify patients eligible for a clinical trial, enabling faster participant enrollment.

## Linked entities

- **Diseases:** gastric cancer (MONDO:0001056)

## Full-text entities

- **Diseases:** Gastric Cancer (MESH:D013274)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12223687/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12223687/full.md

## References

25 references — full list in the complete paper: https://tomesphere.com/paper/PMC12223687/full.md

---
Source: https://tomesphere.com/paper/PMC12223687