# Interpretable machine learning for urothelial cells classification and risk scoring in urine cytology

**Authors:** Lei Xiong, Xinyi Cao, Yu Shang, Zongyue Lu, Hao Jiang, Chang Shi, Chengzhi Zhang, Zhongjing Ma, Lili Tian, Xiaojie Wang, Jiwei Liu, Jia Li, Fengqi Fang

PMC · DOI: 10.1016/j.isci.2025.114259 · iScience · 2025-11-27

## TL;DR

A machine learning framework improves urine cytology by classifying urothelial cells and predicting cancer risk with interpretable features.

## Contribution

Development of an interpretable ML framework for urothelial cell classification and risk scoring in urine cytology.

## Key findings

- Ordinal logistic regression and random forest models achieved over 90% accuracy in classifying urothelial cells.
- Slide-level risk scores effectively stratified cases by likelihood of high-grade urothelial carcinoma (p < 0.0001).
- Interpretable morphological features aligned with established cytologic criteria for transparent predictions.

## Abstract

Urine cytology is widely used for detecting urothelial carcinoma (UC), though its performance is constrained by limited sensitivity and substantial interobserver variability. An interpretable machine learning framework was developed to classify urothelial cells and to estimate slide-level risk of high-grade UC. 10,230 expert-annotated urothelial cells were used to extract 20 quantitative feature representing cytomorphologic criteria defined by the Paris System. Ordinal logistic regression and random forest models were trained and validated, achieving over 90% accuracy for classifying cells into normal, atypical, or suspicious categories. Interpretable morphological features were identified as major contributors to prediction. Slide-level risk scores were derived from aggregated cell probabilities in a validation set of 247 cases. These scores effectively stratified negative, atypical, low-grade, and high-grade UC cases (p < 0.0001). Through alignment with established cytologic criteria, this feature-based framework provides a transparent and quantitative approach that may improve consistency, efficiency, and interpretability in digital urinary cytology.

•An interpretable machine learning framework analyzes urothelial cells in urine cytology•Quantitative cytologic features reflecting diagnostic criteria are systematically extracted•Transparent models achieve high cell classification accuracy with explainable features•Slide level risk scores stratify cytology cases by likelihood of high-grade carcinoma

An interpretable machine learning framework analyzes urothelial cells in urine cytology

Quantitative cytologic features reflecting diagnostic criteria are systematically extracted

Transparent models achieve high cell classification accuracy with explainable features

Slide level risk scores stratify cytology cases by likelihood of high-grade carcinoma

Health sciences; Cancer; Machine learning

## Linked entities

- **Diseases:** urothelial carcinoma (MONDO:0040679)

## Full-text entities

- **Diseases:** UC (MESH:D014523)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12800639/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12800639/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/PMC12800639/full.md

---
Source: https://tomesphere.com/paper/PMC12800639