# CINPred: a risk prediction tool for cervical intraepithelial neoplasia

**Authors:** Jiaxuan Gu, Qiao Wang, Aili Li, Penghui Li, Saicong Lu, Zhen Wang, Lin Du, Feifei Zhao, Tingting Zhao, Feng Tian

PMC · DOI: 10.3389/fonc.2026.1702579 · Frontiers in Oncology · 2026-02-10

## TL;DR

This paper introduces CINPred, a machine learning tool that predicts cervical intraepithelial neoplasia (CIN) risk using clinical data to help prevent cervical cancer.

## Contribution

The novel contribution is the development of CINPred, a CatBoost-based prediction tool for CIN risk assessment.

## Key findings

- CatBoost and GBDT achieved the highest AUC of 0.89 and 0.87, respectively, for CIN prediction.
- TCT was identified as the most significant risk factor for CIN based on SHAP values.
- CINPred is available as a web-based tool for quick CIN risk screening.

## Abstract

Cervical intraepithelial neoplasia (CIN) is a group of precancerous lesions associated with invasive carcinoma of the cervix that reflects the continuous progression of cervical cancer (CC). Therefore, early detection and standard treatment can effectively prevent the progression of CIN to CC. The objective of this study is to establish machine learning model using clinical data to predict the risk of CIN in women, and to develop a clinical prediction tool, exploring its broader clinical application significance.

Female patients who sought consultation for cervical lesions at a hospital in Jiangsu province between 2018 and 2021 were enrolled in this study. The feature variables considered in the analysis included age, ThinPrep cytological test (TCT), human papillomavirus (HPV) genotype, multiple infection assessment, folate receptor-mediated tumor detection (FRD) and cotton-tipped swab test. Several algorithms were utilized for establishing the model, including adaptive boosting (AdaBoost), gradient boosting decision tree (GBDT), categorical boosting (CatBoost) and others. The performance of models was rigorously evaluated. The SHapley Additive exPlanation (SHAP) values were used to identify risk factors affecting the risk of CIN.

For predicting CIN events, CatBoost and GBDT had the highest area under the receiver operating characteristic curve (AUC) (0.89, 0.87, respectively). AdaBoost had the highest F1 score (F1 score = 0.81), followed by RF, LR and stochastic gradient descent (SGD). SHAP values suggested that the variables affected the risk of CIN in descending order of magnitude were TCT, age, FRD, cotton-tipped swab, multiple infection and HPV, respectively.

A novel CatBoost-based risk prediction tool for CIN (CINPred) has been developed and it can be accessed through the website at: https://medinfo.hebeu.edu.cn/shiny/CINPred/. CINPred can be used as a quick screening tool to assess CIN risk, offering significant benefits for the development of personalized treatment plans.

## Linked entities

- **Diseases:** cervical intraepithelial neoplasia (MONDO:0022394), cervical cancer (MONDO:0002974)

## Full-text entities

- **Genes:** SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}
- **Diseases:** ACC (MESH:D000230), Cancer (MESH:D009369), CIN (MESH:D002578), inflammatory (MESH:D007249), HSIL (MESH:D000081483), CC (MESH:D002583), HPV infection (MESH:D030361), Chronic cervicitis (MESH:D002575), SCC (MESH:D002294), cervical carcinogenesis (MESH:D063646), ML (MESH:D007859), adenocarcinoma in situ (MESH:D065311), TCT (MESH:D013736), infected (MESH:D007239), FRD (MESH:C562799), viral infection (MESH:D014777), deaths (MESH:D003643), cervical precancerous lesion (MESH:D011230), cervical polyp (MESH:D011127), ASC-US (MESH:D065309), grade (MESH:D008228)
- **Chemicals:** BER-YXY-2024044 (-)
- **Species:** Human papillomavirus (species) [taxon 10566], Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12930185/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12930185/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/PMC12930185/full.md

---
Source: https://tomesphere.com/paper/PMC12930185