# Denoising autoencoder framework for reconstructing missing periodontal clinical records

**Authors:** Asok Mathew, Pradeep Kumar Yadalam

PMC · DOI: 10.3389/fdmed.2026.1710316 · Frontiers in Dental Medicine · 2026-03-16

## TL;DR

This paper introduces a denoising autoencoder framework to reconstruct missing periodontal clinical data, offering a practical solution for improving data completeness in dental research.

## Contribution

The novel contribution is a workflow using a denoising autoencoder to impute missing periodontal records while preserving realistic data distributions.

## Key findings

- The model achieved MAE of 0.61 and RMSE of 0.74 for reconstructing missing periodontal data.
- Imputed values closely matched original distributions, particularly for attachment loss and bleeding on probing.
- The workflow provides a template for handling missing clinical data in periodontal research.

## Abstract

Missing clinical data pose a significant challenge for retrospective analyses and predictive modelling in periodontal research. Traditional imputation methods often overlook the complex correlations among variables and produce implausible values. This study examines the application of generative models to reconstruct missing periodontal clinical records and outlines a comprehensive workflow that spans from data generation to evaluation.

A synthetic periodontal dataset of 200 virtual patients was constructed, capturing realistic distributions of demographic factors (age, gender, smoking, and diabetes) and tooth-level measurements (probing depth, attachment loss, furcation involvement, and bleeding on probing) across eight sites. Missingness was introduced at random to 15% of the clinical variables. A denoising autoencoder with a single hidden layer of 18 neurons was trained to reconstruct the original data from corrupted inputs over 100 epochs. The model learned latent representations of the data and was then used to impute missing entries.

Performance was assessed by comparing reconstructed values to the original data using mean absolute error (MAE) and root mean squared error (RMSE) across individual variables and categories. Overall, MAE and RMSE were 0.61 and 0.74, respectively, with attachment loss and bleeding on probing exhibiting lower errors than probing depth and furcation involvement.

Distribution comparisons showed that imputed values closely matched original distributions. The approach offers a practical framework for handling missing periodontal data and highlights the potential of generative models to improve data completeness without introducing unrealistic values. Limitations include the simplicity of the network architecture. Future work should explore advanced models, integrate multimodal data, and evaluate on real datasets. The workflow—from synthetic data creation and masking to training, imputation, and evaluation—serves as a template for researchers tackling missing clinical data.

## Full-text entities

- **Diseases:** bleeding (MESH:D006470), diabetes (MESH:D003920)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13033649/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13033649/full.md

## References

18 references — full list in the complete paper: https://tomesphere.com/paper/PMC13033649/full.md

---
Source: https://tomesphere.com/paper/PMC13033649