# Reliability of Automated Cephalometric Analysis: A Comparative Assessment of Stratification Strategies Based on Chronological Age Versus Dentition Stage

**Authors:** Anh Thi Ngoc Do, Hung Trong Hoang, Hieu Ngoc Le, Thuy-Trang Thi Ho

PMC · DOI: 10.3390/dj14030167 · 2026-03-12

## TL;DR

This study compares how well an AI tool for analyzing dental X-rays performs when patients are grouped by age versus by their dental development stage.

## Contribution

The study introduces dentition stage as a more accurate stratification method than chronological age for evaluating AI-based cephalometric analysis.

## Key findings

- WebCeph showed high agreement with manual tracing overall (ICC > 0.80 for most parameters).
- Chronological age stratification had weak associations with measurement error and small effect sizes.
- Dentition stage revealed significant AI performance differences, especially in primary-early mixed dentition groups.

## Abstract

Objectives: This study evaluated the accuracy of an artificial intelligence (AI)-based cephalometric software (WebCeph version 2.0.0.) compared with manual tracing and determined whether stratifying patients by chronological age or dentition stage provides a more clinically relevant assessment of AI accuracy. Methods: Three hundred lateral cephalometric radiographs of Vietnamese patients were traced manually by an orthodontist (reference standard) and analyzed automatically by WebCeph. Intra-observer reliability was validated using ICC and Dahlberg’s error. We analyzed the data using three stratification strategies: (1) Overall; (2) Chronological age (<18, 18–25, >25 years); and (3) Dentition stage (<9 primary-early mixed, 9–12 late mixed, >12 permanent). The primary outcome was the absolute measurement difference (∣Δ∣), analyzed using the Kruskal–Wallis test and effect size (η2). Results: Overall, WebCeph showed high concordance with manual tracing (ICC > 0.80 for most parameters). Chronological age stratification showed weak associations with measurement error; differences between groups were largely non-significant (p>0.05) with a small effect size (η2≈0.015). In contrast, the dentition stage revealed significant performance disparities (p<0.05). Notably, accuracy for the Mandibular Arc (ICC = 0.349) and Mandibular Plane Angle (p=0.048) degraded significantly in the primary-early mixed group, a vulnerability obscured by chronological age-based stratification. Conclusions: Dentition stage is a more sensitive and biologically relevant predictor of AI accuracy than chronological age. While WebCeph is reliable for permanent dentition, accuracy degrades significantly in the primary-early mixed phase. Clinicians should prioritize manual verification of mandibular and incisor landmarks in mixed-dentition children.

## Full-text entities

- **Diseases:** Convexity (MESH:D005413), injury to (MESH:D014947), AI (MESH:C538142), root resorption (MESH:D012391), tooth eruption (MESH:D014079), craniofacial deformities (MESH:D005157), asymmetry (MESH:D005146)
- **Chemicals:** acetate (MESH:D000085)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13025524/full.md

---
Source: https://tomesphere.com/paper/PMC13025524