# A predictive model for significant periodontal disease progression: A large-scale cohort study

**Authors:** Georgios S Chatzopoulos, Larry F Wolff

PMC · DOI: 10.4317/medoral.27731 · 2025-10-17

## TL;DR

This study developed a machine learning model to predict which patients are at high risk for significant periodontal disease progression using real-world clinical data.

## Contribution

The study introduces a validated machine learning model for predicting periodontal disease progression using electronic health records.

## Key findings

- The Random Forest model achieved an AUC-ROC of 0.82, 81.6% accuracy, and 79.2% recall in predicting disease progression.
- Baseline mean CAL, smoking, age, and diabetes were the most significant predictors of progression.
- 28.0% of patients experienced significant periodontal disease progression over a 34.7-month follow-up.

## Abstract

The progression of periodontitis is challenging to predict. This study aimed to develop and validate a machine learning model to identify patients at high risk for significant periodontal disease progression using a large dataset from electronic health records.

This retrospective cohort study included 4,117 patients with at least two comprehensive periodontal examinations separated by a minimum of 24 months. The primary outcome was significant progression, defined as a worsening of mean Clinical Attachment Level (CAL) by 1mm. A Random Forest Classifier was trained and validated using baseline demographic, behavioral (smoking), systemic (diabetes, high blood pressure), and periodontal (mean probing depth, mean CAL, bleeding on probing) data. Feature importance was analyzed, and a multivariable logistic regression was performed to quantify associations.

Over a mean follow-up of 34.7 months, 28.0% of patients experienced significant progression. The Random Forest model demonstrated good predictive performance on the unseen test set, achieving an Area Under the Receiver Operating Characteristic Curve (AUC-ROC) of 0.82, an accuracy of 81.6%, and a recall (sensitivity) of 79.2%. The most influential predictors were baseline mean CAL, smoking status, and age. Logistic regression confirmed these findings, showing that the odds of progression were significantly increased by higher baseline CAL (OR=2.45), current smoking (OR=1.98), a 10-year increase in age (OR=1.62), and a diagnosis of diabetes (OR=1.51).

A machine learning model using real-world clinical data can effectively predict significant periodontal disease progression. The findings confirm that a patient's initial disease severity, smoking status, age, and diabetes are the most critical determinants of future risk, highlighting the model's potential utility in personalizing periodontal care.

## Linked entities

- **Diseases:** periodontitis (MONDO:0005076)

## Full-text entities

- **Diseases:** periodontitis (MESH:D010518), periodontal disease (MESH:D010510), high blood pressure (MESH:D006973), diabetes (MESH:D003920), smoking (MESH:D015208), bleeding (MESH:D006470)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12983377/full.md

---
Source: https://tomesphere.com/paper/PMC12983377