# Epidemiological association and machine learning-based prediction of lung cancer risk linked to long-term lagged satellite-derived PM2.5 in China

**Authors:** Feiran Wei, Shijun Yang, Huiying Wang, Meng Zhao, Jinyi Zhou, Xiaobing Shen, Renqiang Han, Gaoqiang Fei

PMC · DOI: 10.3389/fpubh.2025.1536509 · Frontiers in Public Health · 2025-05-30

## TL;DR

This study shows that long-term exposure to PM2.5 in China is linked to lung cancer and uses machine learning to predict risk accurately.

## Contribution

The study introduces an integrated machine learning model for predicting lung cancer risk based on long-term lagged PM2.5 exposure.

## Key findings

- Lung cancer incidence is strongly correlated with PM2.5 exposure at a 9-year lag.
- The combined machine learning model outperforms single models in predicting lung cancer risk.
- Long-term PM2.5 exposure is closely associated with increased lung cancer incidence.

## Abstract

This study investigated association between long-term PM2.5 exposure and lung cancer incidence, focusing on Jiangsu Province, China. We aimed to explore the effects of historical PM2.5 with time lags and build a prediction model using machine learning methods.

An ecological epidemiology study.

Lung cancer incidence data from Jiangsu Province (2014–2018) were combined with annual PM2.5 concentration data from satellite sources for the previous 10 years (lag 0 to lag 9). Correlation and grey correlation analyses were performed to evaluate the lagged relationship between PM2.5 exposure and lung cancer incidence. To address the multicollinearity problem in the data, ridge regression, support vector regression, and back propagation artificial neural network were employed. The combined prediction model was constructed using the optimal weighting method.

The incidence of lung cancer was significantly correlated with PM2.5 concentration at different historical time points, with the strongest correlation at lag 9. The combined prediction model that integrates multiple prediction methods showed higher accuracy and reliability in predicting lung cancer incidence than a single model.

Long-term exposure to PM2.5, especially exposure with a long lag time, is closely related to lung cancer incidence. The integrated machine learning prediction model can be used as a reliable tool to assess the health risks of air pollution.

## Linked entities

- **Diseases:** lung cancer (MONDO:0005138)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12162561/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12162561/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/PMC12162561/full.md

---
Source: https://tomesphere.com/paper/PMC12162561