# Machine learning‐based early prediction of asthma in preschoolers: The COCOA birth cohort study

**Authors:** Chang Hoon Han, Seok‐Jae Heo, Haerin Jang, So‐Yeon Lee, Ji Soo Park, Dong In Suh, Youn Ho Shin, Jihyun Kim, Kangmo Ahn, Myung Hyun Sohn, Eom Ji Choi, Sun Hee Choi, Hey‐Sung Baek, Soo‐Jong Hong, Kyung Won Kim, Inkyung Jung, Soo Yeon Kim

PMC · DOI: 10.1111/pai.70223 · Pediatric Allergy and Immunology · 2025-10-17

## TL;DR

This study developed a machine learning model and a questionnaire to predict asthma in preschoolers using data from a South Korean birth cohort.

## Contribution

The study introduces a high-performing ML model and a clinically applicable questionnaire for early asthma prediction in children.

## Key findings

- ML models improved in accuracy as more data was collected, with AUROC values of 0.614, 0.726, and 0.774 at 6 months, 1 year, and 2 years.
- The questionnaire-based tool had an AUROC of 0.790, comparable to the ML model.
- Key predictors included paternal IgE levels, maternal iron supplementation, and family asthma history.

## Abstract

Early prediction of asthma in preschoolers, which is crucial for timely intervention, remains challenging. This study aimed to develop a machine learning (ML)‐based model and a questionnaire‐based scoring tool for the prediction of asthma at age 3 years.

Data from the COhort for Childhood Origin of Asthma and allergic diseases (COCOA), a comprehensive prospective birth cohort in South Korea, was used. Children with complete 3‐year follow‐up (n = 2007) were divided into development (n = 1472) and validation (n = 535) cohorts based on birth year. Asthma diagnosis at age 3 years was based on physician diagnosis, recurrent wheezing episodes, asthma treatment, or parental reports. Random Forest‐based predictive models were developed using data collected until the age of 2 years, initially selecting features via least absolute shrinkage and selection operator (LASSO) regression. A questionnaire‐based scoring tool was also developed and compared with multiple ML algorithms.

The ML‐based prediction models showed improved performance as the data accumulated. The 6‐month, 1‐year, and 2‐year models had area under the receiver operating characteristic curve (AUROC) values of 0.614, 0.726, and 0.774, respectively, in the validation cohort. The performance of the questionnaire‐based scoring tool (AUROC, 0.790) was comparable to that of the ML‐based model. Important predictors included paternal total IgE levels, maternal iron supplementation during pregnancy, parental asthma history, nut allergy history, and recent lower respiratory infections.

Our study successfully developed robust predictive models for early asthma that demonstrated high performance. The questionnaire‐based scoring tool offers particular value because of its clinical applicability. Further validation in diverse populations and investigation of the causative pathways of the identified predictors are necessary to enhance clinical utility.

## Linked entities

- **Diseases:** asthma (MONDO:0004979)

## Full-text entities

- **Diseases:** respiratory infections (MESH:D012141), wheezing (MESH:D012135), Asthma (MESH:D001249), allergic diseases (MESH:D004342), nut allergy (MESH:D021184)
- **Chemicals:** iron (MESH:D007501)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12533341/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12533341/full.md

## References

33 references — full list in the complete paper: https://tomesphere.com/paper/PMC12533341/full.md

---
Source: https://tomesphere.com/paper/PMC12533341